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Aims 

This course is intended to show the power and range of probabihty by considering real 
examples in which probabilistic modelling is inescapable and useful. Theory will be 
developed as required to deal with the examples. 

Synopsis 

Poisson processes and birth processes. Continuous-time Markov chains. Transition rates, 
jump chains and holding times. Forward and backward equations. Class structure, hitting 
times and absorption probabilities. Recurrence and transience. Invariant distributions 
and limiting behaviour. Time reversal. 

Applications of Markov chains in areas such as queues and queueing networks - 
M/M/s queue, Erlang's formula, queues in tandem and networks of queues, M/C/1 and 
G/M/1 queues; insurance ruin models; epidemic models; applications in applied sciences. 

Renewal theory. Limit theorems: strong law of large numbers, central limit theo- 
rem, elementary renewal theorem, key renewal theorem. Excess life, inspection paradox. 
Applications. 

Reading 

• J.R. Norris, Markov chains, Cambridge University Press (1997) 

• G.R. Grimmett, and D.R. Stirzaker, Probability and Random Processes, 3rd edition, 
Oxford University Press (2001) 

• G.R. Grimmett, and D.R. Stirzaker, One Thousand Exercises in Probability, Oxford 
University Press (2001) 

• D.R. Stirzaker, Elementary Probability, Cambridge University Press (1994) 

• S.M. Ross, Introduction to Probability Models, 4th edition. Academic Press (1989) 
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Lecture 1 



Introduction: Poisson processes, 
generalisations and applications 

Reading: Part A Probability; Grimmett-Stirzaker 6.1, 6.8 up to (10) 
Further reading: Ross 4-1, 5.3; Norris Introduction, 1.1, 2.4 

This course is, in the first place, a course for 3rd year undergraduates who did Part 
A Probability in their second year. Other students from various M.Sc.'s are welcome 
as long as they are aware of the prerequisites of the course. These are essentially an 
introductory course in probability not based on measure theory. It will be an advantage 
if this included the central aspects of discrete-time Markov chains, by the time we get to 
Lecture 5 in week 3. 

The aim of Lecture 1 is to give a brief overview of the course. To do this at an 
appropriate level, we begin with a review of Poisson processes that were treated at the 
very end of the Part A syllabus. The parts most relevant to us in today's lecture are 
again included here, and some more material is on the first assignment sheet. 

This is a mathematics course. "Applied probability" means that we apply probability, 
but not so much Part A Probability but further probability building on Part A and not 
covered there, so effectively, we will be spending a lot of our time developing theory as 
required for certain examples and applications. 

For the rest of the course, let N — {0, 1,2,.. .} denote the natural numbers including 
zero. Apart from very few exceptions, all stochastic processes that we consider in this 
course will have state space N (or a subset thereof). However, most results in the theory 
of Markov chains will be treated for any countable state space S, which does not pose any 
complications as compared with N, since one can always enumerate all states in S and 
hence give them labels in N. For uncountable state spaces, however, several technicahties 
arise that are beyond the scope of this course, at least in any reasonable generality - we 
will naturally come across a few examples of Markov processes in R towards the end of 
the course. 
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1.1 Poisson processes 

There are many ways to define Poisson processes. We clioose an elementary definition 
that happens to be the most illustrative since it allows to draw pictures straight away 

Definition 1 Let {Zn)n>o be a sequence of independent exponential random variables 
Zn ~ Exp{X) for a parameter (inverse mean) A e (0, oo), Tq = 0, T„ = ^^Zq ^k, n > 1. 
Then the process X — {Xt)t>o defined by 

= #{n > 1 : Tn < t} 
is called Poisson process with rate A. 

Note that {Xt)t>o is not just a family of (dependent!) random variables but indeed 
t ^ Xf is a, random right-continuous function. This view is very useful since it is the 
formal justification for pictures of "typical realisations" of X. 

Think of Tn as arrival times of customers (arranged in increasing order) . Then Xt is 
counting the numbers of arrivals up to time t for alH > and we study the evolution of 
this counting process. Instead of customers, one might be counting particles detected by 
a Geiger counter or cars driving through St. Giles, etc. Something more on the link and 
the important distinction between real observations (cars in St. Giles) and mathematical 
models (Poisson process) will be included in Lecture 2. For the moment wc have a 
mathematical model, well specified in the language of probability theory. Starting from a 
simple sequence of independent random variables {Zn)n>o we have defined a more complex 
object {Xt)t>Q, that we call Poisson process. 

Let us collect some properties that, apart from some technical details, can serve as 
an alternative definition of the Poisson process. 

Remcirk 2 A Poisson process X with rate A has the following properties 

(i) Xt ~ Poi{Xt) for all t >0, where Poi refers to the Poisson distribution with mean 
A. 

(ii) X has independent increments, i.e. for all to < ... < tn, Xt. — A^t^_i, j — 1, . . . ,n, 
are independent. 

(iii) X has stationary increments, i.e. for all s < t, Xt+s — Xt Xs, where ~ means 
"has the same distribution as" . 

To justify (i), calculate 

oo oo 

E = ^g"P(Xi = n)=^g"P(7;<t,r„+i>t) 



n=0 n=0 

oo 



J2 inTn <t)- nTn+i <t))^i-j2 - ^)nTj < t) 

n=0 j=l 



t oo 



A^' 



= 1- / J^^q'-\l-q)j-—^_z^-'e-''dz 

t 

\t{l-q) 



1- / {l-q)\e-^'+^'i'dz 
Jo 



e 
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where we used the well-known fact that T„ as a sum of independent £'xp(A)-variables has a 
Gamma{n, A) distribution. It is now easily checked that this is the probability generating 
function of the Poisson distribution with parameter At. We conclude by the Uniqueness 
Theorem for probability generating functions. Note that we interchanged summation and 
integration. This may be justified by an expansion of e~^^ into a power series and using 
uniform convergence of power series twice. We will see another justification in Lecture 3. 
(ii)-(iii) can be derived from the following Proposition 3, see also Lecture 4. 

1.2 The Markov property 

Let S be a countable state space, typically § = N. Let 11 = {7irs)r,ses be a Markov 
transition matrix on §. For every sq € S this specifies the distribution of a Markov chain 
{Mn)n>o starting from sq (under P^,,, say), by 

n 

P,o(Mi = si, . . . , M„ = s„) = P(Mi = si, . . . , M„ = Sn\Mo = sq) = H ^-^-i'-^' 

We say that (M„)„>o is a Markov chain with transition matrix U starting from sq- There 
are several formulations of the Markov property: 

• For all paths sq, ■ ■ ■ , Sn+i £ S of positive probability, we have 

P(M„+i = S„+i|Mo = So, . . . , M„ = S„) = P(M„+i = S„+i|M„ = S„) = TTs^^^^+r 

• For all s e § and events {(Mq, . . . , M„) e A} and {(M„, M^+i, . . .) G B}, we have: 
if P(M„ = s, {Mj)o<j<n eA)>0, then 

P((M„+fc)fc>o e B\Mn = s, (M,)o<,<n eA)= ¥{{Mn+k)k>o e B\Mn = s). 

• (Mj)o<j<n and {Mn+k)k>o are conditionally independent given Af„ = s, for all s G S. 
Furthermore, given M„ = s, {Mn^k)k>o is a Markov chain with transition matrix 11 
starting from s. 

Informally: no matter how we got to a state, the future behaviour of the chain is as if 
we were starting a new chain from that state. This is one reason why it is vital to study 
Markov chains not starting from one initial state but from any state in the state space. 

In analogy, we will here study Poisson processes X starting from mitial states Xq = 
k & N (under P^), by which we just mean that we consider Xt — k + Xf, t > 0, where X 
is a Poisson process starting from as in the above definition. 

Proposition 3 (Meirkov property) Let X be a Poisson process with rate A starting 
from and t > a fixed time. Then the following hold. 

(i) For all k & N and events {{Xr)r<t G A} and {{Xt+s)s>o G B}, we have: ifF{Xt ~ 
k, {Xr)r<t G ^) > 0, then 

F{{Xt+,)s>o G B\Xt = k, {X,)r<t eA)= F{{Xt+s)s>o G B\Xt = k) = Fk{{X,),^o G B). 
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(ii) Given X-t — k, {Xr)r<t O'^d {Xt+s)s>o o-f^ independent, and the conditional distri- 
bution of {Xt+s)s>o is that of a Poisson process with rate A starting from k. 

(iii) {Xt+s — Xt)s>o is a Poisson process with rate A starting from 0, independent of 

{Xr)r<t- 

We will prove a more general Proposition 17 in Lecture 4. Also, in Lecture 2, we will 
revise and push further the notion of conditioning. For this lecture we content ourselves 
with the formulation of the Markov property and proceed to the overview of the course. 

Markov models (models that have the Markov property) are useful in a wide range of 
applications, e.g. price processes in Mathematical Finance, evolution of genetic material 
in Mathematical Biology, evolutions of particles in space in Mathematical Physics. The 
Markov property is a property that makes the model somewhat simple (not easy, but it 
could be much less tractable). We will develop tools that support this statement. 

1.3 Brief summary of the course 

Two generalisations of the Poisson process and several applications make up this course. 

• The Markov property of Proposition 3(ii) can be used as a starting point to a bigger 
class of processes, so-called continuous-time Markov chains. They are analogues of 
discrete-time Markov chains, and they are often better adapted to applications. On 
the other hand, new aspects arise that did not arise in discrete time, and connections 
between the two will be studied. Roughly, the first half of this course is concerned 
with continuous-time Markov chains. Our main reference book will be Norris's 
book on Markov Chains. 

• The Poisson process is the prototype of a counting process. For the Poisson process, 
"everything" can be calculated explicitly. In practice, though, this is often only 
helpful as a first approximation. E.g. in insurance applications, the Poisson process 
is used for the arrival of claims. However, there is empirical evidence that inter- 
arrival times are neither exponentially distributed nor independent nor identically 
distributed. The second approximation is to relax exponentiality of inter-arrival 
times but keep their independence and identical distribution. This class of counting 
processes is called renewal processes. Since exact calculations are often impossible 
or not helpful, the most important results of renewal theory are limiting results. Our 
main reference will be Chapter 10 of Grimmett and Stirzaker's book on Probability 
and Random Processes. 

• Many applications that we discuss are in queueing theory. The easiest, so-called 
M/M/1 queue consist of a server and customers arriving according to a Poisson 
process. Independently of the arrival times, each customer has an exponential 
service time for which he will occupy the server, when it is his turn. If the server 
is busy, customers queue until being served. Everything has been designed so that 
the queue length is a continuous-time Markov chain, and various quantities can be 
studied or calculated (equilibrium distribution, lengths of idle periods, waiting time 
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distributions etc.). More complicated queues arise if the Poisson process is replaced 
by a renewal process or the exponential service times by any other distribution. 
There are also systems with k = 2,3, ... ,00 servers. The abstract queueing systems 
can be more concretely applied in telecommunication, computing networks, etc. 

• Some other applications include insurance ruin and propagation of diseases. 
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Lecture 2 



Conditioning and stochastic 
modelling 

Reading: Grimmett- Stir zaker 3.7, 4-6 
Further reading: Grimmett- Stir zaker 4 ■'7; CT4. Unit 1 

This lecture consolidates the ideas of conditioning and modelling preparing a more 
varied range of applications and a less mechanical use of probability than what was the 
focus of Part A. Along the way, we explain the meaning of statements such as the Markov 
properties of Lecture 1. 

2.1 Modelling of events 

Much of probability theory is about events and probabilities of events. Informally, this 
is an easy concept. Events hke Ai ="the die shows an even number" and A2 ="the first 
customer arrives before 10am" make perfect sense in real situations. When it comes to 
assigning probabilities, things are less clear. We seem to be able to write down some 
(P(Ai) = 0.5?) probabilities directly without much sophistication (still making implicit 
assumptions about the fairness of the die and the conduct of the experiment). Others 
{F{A2)) definitely require a mathematical model. 

Hardly any real situations involve genuine randomness. It is rather our incomplete 
perception/information that makes us think there was randomness. In fact, assuming a 
specific random model in our decision-making can be very helpful and lead to decisions 
that are sensible/good/beneficial in some sense. 

Mathematical models always make assumptions and refiect reality only partially. 
Quite commonly, we have the following phenomenon: the better a model represents real- 
ity, the more complicated it is to analyse. There is a trade-off here. In any case, we must 
base all our calculations on the model specification, the model assumptions. Translating 
reality into models is a non-mathematical task. Analysing a model is purely mathemat- 
ical. Models have to be consistent, i.e. not contain contradictions. This statement may 
seem superfluous, but there are models that have undesirable features that cannot be 
easily removed, least by postulating the contrary. E.g., you may wish to specify a model 
for customer arrival where arrival counts over disjoint time intervals are independent. 
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arrival counts over time intervals of equal lengths have the same distribution (cf. Re- 
mark 2 ii)-iii)), and times between two arrivals have a nonexponential distribution. Well, 
such a model does not exist (we won't prove this statement now, it's a bit too hard at 
this stage). On the other hand, within a consistent model, all properties that were not 
specified in the model assumptions have to be derived from these. Otherwise it must be 
assumed that the model may not have the property. 

Suppose we are told that a shop opens at 9.30am, and on average, there are 10 
customers per hour. One model could be to say, that a customer arrives exactly every 
six minutes. Another model could be to say, customers arrive according to a Poisson 
process at rate A = 10 (time unit=l hour). Whichever model you prefer, fact is, you can 
"calculate" P(^2), and it is not the same in the two models, so we should reflect this in 
our notation. We don't want it to be A2 that changes, so it must be P, and we may wish 
to write P for the second model. P should be thought of as defining the randomness. 
Similarly, we can express dependence on a parameter by F^'^\ dependence on an initial 
value by P^. Informally, for a Poisson process model, we set Pfc(A) = P(A|Ao = k) for 
all events A (formally you should wonder whether P(A'o — k) > 0). 

Aside: Formally, there is a way to define random variables as functions : Q — > N, 
Z„ : — > [0, 00) etc. Pfe can then be defined as a measure on Q for all k, and this measure 
is compatible with our distributional assumptions which claim that 

(XtV 

the probability that Xt = k + j is -^/-e"^* 



in that 

Fk{Xt =j + k)= Fk{{LJ e n : Xt{uj) =j + k}) = 



(At)^_At 



In the mathematical sense, the set Aj^k '■— {^t — j + k} :— {lo E Q : Xt{u;) — j + k} Cfl 
is called an event. Technically^, we cannot in general call all subsets of J7 events if ^ is 
uncountable, but we will not worry about this, since it is very hard to find examples of non- 
measurable sets. Lo should be thought of as a scenario, a realisation of all the randomness, ft 
collects the possibilities and P tells us how likely each event is to occur. If we denote the set 
of all events by A, then (O, A, P) is called a stochastic basis. Its form is usually irrelevant. It 
is important that it exists for all our purposes to make sure that the random objects we study 
exist. We will assume that all our random variables can be defined as (measurable) functions 
on This existence can be proved for all our purposes, using measure theory. In fact, when 
we express complicated families of random variables such as a Poisson process {Xt)t>o in terms 
of a countable family {Zn)n>i of independent random variables, we do this for two reasons. 
The first should be apparent: countable families of independent variables arc conceptually 
easier than uncountable families of dependent variables. The second is that a result in measure 
theory says that there exists a stochastic basis on which we can define countable families of 
independent variables whereas any more general result for uncountable families or dependent 
variables requires additional assumptions or other caveats. 



^The remainder of this paragraph is in a smaller font. This means (now and whenever something is 
in small font) , that it can be skipped on first reading, and the reader may or may not want to get back 
to it at a later stage. 
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It is very useful to think about random variables Z„ as functions Zniou), because 
it immediately makes sense to define a Poisson process Xt{iju) as in Definition 1, by 
defining new functions in terms of old functions. When learning probability, it is usual 
to first apply analytic rules to calculate distributions of functions of random variables 
(transformation formula for densities, expectation of a function of a random variable in 
terms of its density or probability mass function). Here we are dealing more explicitly 
with random variables and events themselves, operating on them directly. 

This course is not based on measure theory, but you should be aware that some 
of the proofs are only mathematically complete if based on measure theory. Ideally, 
this only means that we apply a result from measure theory that is intuitive enough to 
believe without proof. In a few cases, however, the gap is more serious. Every effort 
will be made to point out technicalities, but without drawing attention away from the 
probabilistic arguments that constitute this course and that are useful for applications. 

BlOa Martingales Through Measure Theory provides as pleasant an introduction to 
measure theory as can be given. That course nicely complements this course in providing 
the formal basis for probability theory in general and hence this course in particular. 
However, it is by no means a co-requisite, and when we do refer to this course, it is 
likely to be to material that has not yet been covered there. Williams' Probability with 
Martingales is the recommended book reference. 

2.2 Conditional probabilities, densities and expecta- 
tions 

Conditional probabilities were introduced in Part A (or even Mods) as 

nA) 



where we require F{A) > 0. 

Example 4 Let X be a Poisson process. Then 

by the independence and stationarity of increments. Remark 2 (ii)-(iii). 
Conditional densities were introduced as 

/ |.\ f / N fs,T{s,t) 

fs\T{s\t) = fs\T=t[S) = 

Example 5 Let X be a Poisson process. Then, for t > s, 
/t.|t..,(0 = -j^ = ^^^(^j = = /.,(«-.) = Ae ). 

by the transformation formula for bivariate densities to relate /Ti,r2 to fzo,Zi, and inde- 
pendence of Zq and Zi. 
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Conditioning has to do with available information. Many models are stochastic only 
because the detailed deterministic structure is too complex. E.g. the counting process 
of natural catastrophies (hurricanes, earthquakes, volcanic eruptions, spring tide) etc. is 
genuinely deterministic. Since we cannot observe and model precisely weather, tectonic 
movements etc. it is much more fruitful to write down a stochastic model, e.g. a Poisson 
process, as a first approximation. We observe this process over time, and we can update 
the stochastic process by its realisation. Suppose we know the value of the intensity 
parameter A G (0, oo). (If we don't, any update will lead to a new estimate of A, but 
we do not worry about this here). If the first arrival takes a long time to happen, this 
gives us information about the second arrival time T2, simply since T2 = Ti + Zi > Ti. 
When we eventually observe Ti — s, the conditional density of T2 given Ti — s takes into 
account this observation and captures the remaining stochastic properties of T2. The 
result of the formal calculation to derive the conditional density is in agreement with the 
intuition that if Ti = s, T2 = Ti + Zi ought to have the distribution of Zi shifted by s. 

Example 6 Conditional probabilities and conditional densities are compatible in that 

¥{SeB\T = t)= I fs\T=t{s)ds = limP(S E B\t<T <t + e), 
Jb ^-^0 

provided only that fr is right-continuous. To see this, when fs,T is sufficiently smooth, 
write for all intervals B = (a, 6) 

nSeB\t<T<t + .) = ^'^ <^B.t<T< t + e) ^ ^ f!*' L fMiMu 

^ I - - ' F{t <T <t + e) iP(i <T <t + e) 

and under the smoothness condition (by dominated convergence etc.), this tends to 

FTTx = / fs\T=t{s)ds = F{S e B\T = t). 

jT[t) Jb 

Similarly, we can also define 

¥{X = k\T = t) = limP(X = k\t<T<t + e) 

One can define conditional expectations in analogy with unconditional expections, e.g. 
in the latter case by 

00 

E{X\T^t)=J2jnX = j\T = t). 
j=o 

Proposition 7 a) If X andY are (dependent) discrete random variables in N, then 

00 

E{X) = J]]E(X|y = n)F{Y = n). 

n=0 

If b) X and T are jointly continuous random variables in (0, cxo) or c) if X is discrete 
and T is continuous, and if T has a right- continuous density, then 



poo 

E{X)^ I E{X\T ^t)fT{t)dt. 
Jo 
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Proof: c) We start at the right-hand side 

poo poo ^ 

and calculate 



— lim 



eiO ¥{t<T <t + £) 

iP(i <T <t + e\X ^ j)¥{X = j) 
elo ^P(i < T < i + £) 

/T|x=,(i)P(i = j) 



/T(i) 

so that we get on the right-hand side 



OO OO /.QQ 

J] jP(X = j|T = t)fT{t)dt = J] jP(X = j) / fT\x=j{t)dt = E{X) 

Jo 



j=0 j=0 



after interchanging summation and integration. This is justified by Tonnelli's theorem 
that we state in Lecture 3. 
b) is similar to c). 

a) is more elementary and left to the reader. □ 

Statement and argument hold for left-continuous densities and approximations from 
the left, as well. For continuous densities, one can also approximate {T — t} hy {t — e < 

T < t + e} (for e < t, and normalisation by 2e, as adequate). 

Recall that we formulated the Markov property of the Poisson process as 

P((^t+«)«>o e B\Xt = k, {X,)r<t eA)^ Pfc((Xt+„)«>o e B) 

for all events {{Xr)r<t G A} such that P(Xj = k, {Xr)r<t e A) > 0, and {(Xf+„)„>o G B}. 
For certain sets A with zero probability, this can still be established by approximation. 

2.3 Independence and conditional independence 

Recall that independence of two random variables is defined as follows. Two discrete 
random variables X and Y are independent if 

¥{X ^j,Y = k) = F{X = j)F{Y = k) for all j, k E 

Two jointly continuous random variables 5" and T are independent if their joint density 
factorises, i.e. if 

/5,t(s, t) = fs{s)fT{t) for all s,teR, where fs{s) = I fs,T{s, t)dt. 

Jm. 
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Recall also (or check) that this is equivalent, in both cases, to 

¥{S <s,T<t)=F{S< s)F{T < t) for all s,teR. 
In fact, it is also equivalent to 

F{S e A,T e B) ^F{S e B)F{T e B) for all (measurable) A,B gR, 
and we define more generally: 

Definition 8 Let X and Y be two random variables with values in any, possibly difi^erent 
spaces X and Y. Then we call X and Y independent if 

F{X eA,YeB)= F{X e A)F{Y e B) for all (measurable) ^ C X and 5 C Y. 

We call X and Y conditionally independent given a third random variable Z if for all 
^ e § (if Z has values in S) or z & [0, oo) (if Z has values in [0, oo)), 

¥{X eA,Ye B\Z = z) = ¥{X eA\Z^ z)¥{Y e B\Z = z). 

Remcirk and Fact 9 ^ Condional independence is in many ways like ordinary (uncondi- 
tional) independence. E.g., if X is discrete, it suffices to check the condition for A — {x}, 
a; G X. If y is real- valued, it suffices to consider B = (— oo, y], y & M.. If y is bivariate, 
it suffices to consider all B of the form B = Bi x B2. 

If X = {/ : [0,t] — >• N right-continuous}, it suffices to consider A = {f E X : /(ri) = 
rii, . . . , f{rm) = rim} for all < ri < . . . < < i and rii, . . . , rim G This is the basis 
for the meaning of Proposition 3(ii). 

We conclude by a fact that may seem obvious, but does not follow immediately from 
the definition. Also the approximation argument only gives some special cases. 

Fact 10 Let X be any random variable, and T a real-valued random variable with right- 
continuous density. Then, for all (measurable) / : X x [0, oo) [0, oo), we have 

E(/(X,T)|T = t)=E(/(X,t)|T = t). 

Furthermore, if X and T are independent and g :X ^ [0, oo) (measurable) we have 

E{g{X)\T^t)^E{g{X)). 

If X takes values in [0, oo) also, example for / are e.g. f{x,t) = l^x+t>s}, where 
l{x+t>s} ■— i a X -\-t > s and l{x+t>s} '■— otherwise; or f{x,t) — e^^^"^*) in which case 
the statements are 

¥{X + T>s\T = t) =¥{X + t> s\T = t) and E{e^^^+'^^\T = t) = e^*E(e^^|T = t), 

and the condition {T = t} can be removed on the right-hand sides if X and T are 
independent. This can be shown by the approximation argument. 
The analogue of Fact 10 for discrete T is elementary. 

^Facts are results that are true, but that we cannot prove in this course. Note also that there is a 
grey zone between facts and propositions, since proofs or partial proofs sometimes appear on assignment 
sheets, in the main or optional parts. 



Lecture 3 

Birth processes and explosion 



Reading: Morris 2.2-2.3, 2.5; Grimmett-Stirzaker 6.8 (11), (18)- (20) 



In this lecture we introduce birth processes in analogy with our definition of Poisson 
processes. The common description (also for Markov chains) is always as follows: given 
the current state is m, what is the next part of the evolution, and (for the purpose 
of an inductive description) how does it depend on the past? (Answer to the last bit: 
conditionally independent given the current state, but this can here be expressed in terms 
of genuine independence). 

3.1 Definition and an example 

If we consider the Poisson process as a model for a growing population, it is not always 
sensible to assume that new members are born at the same rate regardless what the size 
of the population. You would rather expect this rate to increase with size (more births 
in larger populations), although some saturation effects may make sense as well. 

Also, if the Poisson process is used as a counting process of alpha particle emissions 
of a decaying radioactive substance, it makes sense to assume that the rate is decreasing 
with the number of emissions, particularly if the half-life time is short. 

Definition 11 A stochastic process X — {Xt)t>Q of the form 



is called a simple birth process of rates (A„)„>o starting from Xq = A; G N, if the inter- 
arrival times Zj, j > 0, are independent exponential random variables with parameters 
^k+j, J > 0. X is called a {k, (A„)„>o)-birth process. 

Note that the parameter A„ is attached with height n. The so-called holding time at 
height n has an Exp{Xn) distribution. 

"Simple" refers to the fact that no two births occur at the same time, which one 
would call "multiple" births. Multiple birth processes can be studied as well, and, given 
certain additional assumptions, form examples of continuous-time Markov chains, like 
simple birth processes do, as we will see soon. 




13 
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Example 12 Consider a population in which each individual gives birth after an expo- 
nential time of parameter A, all independently and repeatedly. 

If n individuals are present, then the first birth will occur after an exponential time 
of parameter n\. Then we have n + 1 individuals and, by the lack of memory property 
(see Assignment 1), the process begins afresh with n-\-l exponential clocks independent 
of the previous evolution {n — 1 clocks still running with residual Exp{X) times and 2 
new clocks from the individual that has just given birth and the new-born individual). 
By induction, the size of the population performs a birth process with rates A„ = nX. 

Let Xt denote the number of individuals at time t and suppose Xq = 1. Write T for 
the time of the first birth. Then by Proposition 7c) 

poo pt 

E{Xt) = / Ae-^"E(Xt|r = u)du = / Ae-^"E(Xt|r = u)du + e"^* 
Jo Jo 

since Xt = Xq = 1 if T > t. 

Put fx{t) = ]E(Aj), then for < u < t, intuitively E{Xt\T = u) = 2fx{t — u) since from 
time the two individuals perform independent birth processes of the same type and we 
are interested in their population sizes t — u time units later. We will investigate a more 
formal argument later, when we have the Markov property at our disposal. 

Now 



and setting r = t — u 



Differentiating we obtain 



li(t) = f 2Ae~^Xi - u)du + e~ 
Jo 

e^V(^) = 1 + 2A [ e^'^n{r)dr. 
Jo 



fi'it) = Xfi{t) 

so the mean population size grows exponentially, and Xq — 1 implies 

E{Xt) = ii{t) = e^*. 

3.2 Tonelli's Theorem, monotone and dominated con- 
vergence 

The following is a result from measure theory that we cannot prove or dwell on here. 

Fact 13 (Tonelli) You may interchange order of integration, countable summation and 
expectation whenever the integrand/ summands /random variables are nonnegative, e.g. 

(\ pco poo 

^xJ=J]E(X„), / ^/„(x)dx = ^/ Ux)dx 

n>0 / n>0 n>0 n>0 ''^ 

/ / f{x,y)dydx = / / f{x,y)dxdy. 

Jo Jo Jo Jy 
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There were already two applications of this, one each in Lectures 1 and 2. The focus 
was on other parts of the argument, but you may wish to consider the justification of 
Remark 2 and Proposition 7 again now. 

Interchanging limits is more delicate, and there are monotone and dominated conver- 
gence for this purpose. In this course we will only interchange limits when this is justified 
by monotone or dominated convergence, but we do not have the time to work out the 
details. Here are indicative statements. 



Fact 14 (Monotone convergence) Integrals (expectations) of an increasing sequence 
of nonnegative functions (random variables Y^) converge (in the sense that\min^^¥.{Yn) = 
E(lim„^ooi;);- 

Fact 15 (Dominated convergence) Integrals (expectations) of a pointwise convergent 
sequence of functions fn ^ f (applied to a random variable) converge, if the sequence 
\fn\ ^ 9 is dominated by an integrable function g (function which when applied to the 
random variable has finite expectation), i.e. 

/g{x)dx < oo ^ lim / fn(x)dx = / lim fn{x)dx 
n— >oo J J n^oo 

E(^(X)) < oo ^ lim E(/„(X)) = E( lim /,(X)). 

n— >oo n-^oo 

We refer to BlOa Martingales Through Measure Theory for those of you who follow 
that course. In any case, our working hypothesis is that, in practice, we may interchange 
all limits that come up in this course, but also know that these two theorems are required 
for formal justification, and refer to them. 



3.3 Explosion 

If the rates (A„)„6is} increase too quickly, it may happen that infinitely many individuals 
are born in finite time (as with deterministic inter-birth times). We call this phenomenon 
explosion. Formally, we can express the possibility of explosion by P(Too < oo) > where 
Too = lim„^oo Tn- Remember that it is not a valid argument to say that this is ridiculous 
for the application we are modelling and hence cannot occur in our model. We have 
to check whether it can occur under the model assumptions. And if it does occur and 
is ridiculous for the application, it means that the model is not a good model for the 
application. For simple birth processes, we have the following necessary and sufficient 
condition. 



Proposition 16 Let X be a {k, {\n)n>o) -birth process. Then 

oo ^ 

P(Too < oo) > if and only if — < oo. 
Furthermore, in this case P(Too < oo) = 1. 



m=k 
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Proof: Note that 




where ToneUi's Theorem allows us to interchange summation and expectation. Therefore, 
if the series is finite, then E(Too) < oo imphes P(T'tx> < oo) = 1. 

Since, in general, P(Too < oo) does not imply E(Too) < oo, we are not yet done for 
the converse. However, (using monotone convergence), and also the independence of the 
Zn, we can calculate 



logE(e"^-) = -^logE(e-^") 

n=0 

oo . 



m=k ^ 

Either this latter sum is greater than log(2) Xl^no j- ^ ^ ^oi n > uq, by linear 

interpolation and concavity of the logarithm. Or otherwise a restriction to any subse- 
quence Anj, < 1 shows that the sum is infinite as each of these summands contributes at 
least log (2). 

Therefore, if the series diverges, then E(e~^°°) = 0, i.e. P(Too = oo) — 1. □ 

Note that we have not explicitly specified what happens after T^o if Too < oo. With 
a population size model in mind, Xf — oo for all t > T^o is a reasonable convention. 
Formally, this means that X is a process in N = N U {oo}. This process is often called 
the minimal process, since it is "active" on a minimal time interval. We will show the 
Markov property for minimal processes. It can also be shown that there are other ways 
to specify X after explosion that preserve the Markov property. The next natural thing 
to do is to start afresh after explosion. Such a process is then called non-minimal. 



Lecture 4 

Birth processes and the Markov 
property 

Reading: Norris 2.4-2.5; Grimmett-Stirzaker 6.8 (21)-(25) 
In this lecture we discuss in detail the Markov property for birth processes. 

4.1 Statements of the Markov property 

Proposition 17 (Markov property) Let X be a simple birth process with rates (A„)„>o 
starting from k > 0, and t > a, fixed time, I > k a fixed height. Then given Xf = i, 
{Xr)r<t and {Xt^s)s>o are conditionally independent, and the conditional distribution of 
{Xt+s)s>o is that of a simple birth process with rates (A„)„>o starting from k, we use 
notation 

{Xr)r<t Y[ i^t+s)s>o ~ {Xn)n>o) -birth proccss. 
Xt=e 

Example 18 In formulas, we can write (and apply) 

P((X,),<, e A, (x,+,),>o e B\Xt = i) = F{{Xr)r<t e A\Xt = £)P((x,+,)s>o e B\x, = i) 

= P((x,),<i e A\Xt = £)P((x,),>o e B), 

where X is an (£, (A„)„>o)-birth process and A, B arbitrary sets of paths, e.g. A — {f: 
[0, ^ N : /(rj) = rij.j = 1, . . . , m} for some < ri < . . . < < e N. Note that 

{Xr)r<t G ^ = % for all J = 1, ... , m. 

Therefore, in particular 

F{Xr^ = m, . . . , Xr^ = n„„ Xt+si =Pi,..., Xt+s^ = Ph\Xt = i) 

= F{Xr, =ni,...,Xr^ = nm\Xt = e)F{X,^ = . . . , X,, = ph) 

17 
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The Markov property as formulated in Proposition 17 (or Example 18) says that 
"past and future are (conditionally) independent given the present" . We can say this in 
a rather different way as "the past is irrelevant for the future, only the present matters" . 
Let us derive this reformulation using the elementary rules of conditional probability: the 
Markov property is about three events 

past E ^ {{Xr)r<t ^ A}, future F = {(Xt+s)s>o e -B} and present C = {X^ = £} 

and states ¥{E n F\C) = ¥{E\C)¥{F\C). This is a special property that does not hold 
for general events E, F and C. In fact, by the definition of conditional probabilities, we 
always have 

n Fio ^ ^ nF\EncnEnc) ^ ^^^^^ ^ 

SO that, by comparison, if F{E fl C) > 0, we here have F{F\E nC) = F{F\C), and we 
deduce 

Corollary 19 (Markov property, alternative formulation) For all t > 0, £ > k 

and sets of paths A and B with F{Xt = i, {Xr)r<t G A) > 0, we have 

F{{Xt+s)s>o e B\Xt = £, {Xr)r<t eA)^ F{{Xt+s)s>o e B\Xt = ^) = m^s)s>o e B), 

where X is an {£, {\n)n>o) 'birth process. 

In fact, the condition F{Xt = i, {Xr)r<t & A) > can often be waived, if the con- 
ditional probabilities can still be defined via an approximation by events of positive 
probability. This is a very informal statement since different approximations could, in princi- 
ple, give rise to different values for such conditional probabilities. This is, in fact, very subtle, 
and uniqueness cannot be achieved in a strong sense, but some exceptional points usually occur. 
Formally, one would introduce versions of conditional probabilities, and sometimes, a nice ver- 
sion can be identified. For us, this is just a technical hurdle that we do not attempt. If we did 
attempt it, we would find enough continuity (or right-continuity), and see that we usually in- 
tegrate conditional probabilities so that any non-uniqueness would not affect our final answers. 
See BlOa Martingales Through Measure Theory for details. 



4.2 Application of the Markov property 

We make (almost) rigorous the intuitive argument used in Example 12 to justify E(Xt(|T = 
t) = 2E(X„_t), where u > t, X is a. (1, (nA)„>o)-birth process and T = inf{t > : Xt = 
2}. In fact, we now have 

{T^t}^ {Xr ^l,r<t;Xt^2}c {Xt = 2} 

so that for all n > 

P(X„ = n\T = t)= P(X„ = n\Xr =l,r<t;Xt = 2) = P(X„ = n\Xt = 2)= F{X^_t = n), 
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where X is a (2, (nA)n>o)-birth process. This also gives 
E{Xu\T^t) = J^nP(X„ = n|r = t) 

neN 

= Y,nF{X^_t = n\Xo = 2) = E(X„_i) = 2E{X^_t), 

neN 

since, by model assumption, the families of two individuals evolve completely indepen- 
dently like separate populations starting from one individual each. 

Remark: The event {T = t} has probability zero (P(T = t) = as T is exponentially 
distributed). Therefore any conditional probabilities given T = t have to be approximated 
(using {t — £ <T <t}) in order to justify the application of the Markov property. We will not 
go through the details of this in this course and leave this to the insisting reader. Our focus is 
on the probabilistic argument that the application of the Markov property constitutes here. 

4.3 Proof of the Markov property 

Proof: Assume for simplicity that Xq = 0. The general case can then be deduced. On 
{Xt = k} = {Tk <t < Tk+i} we have 

{n-l 
n>l:J2Zj< 

{m—l 
m>l:^Zj <s 
j=0 

where Zj — Zk+j, J > 1, and Zq — Tk+i — t. Therefore, X has the structure of a 
birth process starting from k, since given Xo = k. the Zj ~ Exp{\k+j) are independent 
(conditionally given X^ — k). For j — note that 

F(Zo > z\Xt = A;) - ¥(Zk > {t - n) + z\Zk >t-n>0) = ¥{Zk > z) 

where we applied the lack of memory property of Zk to the independent threshold t — Tk- 
This actually requires a slightly more thorough explanation since we are dealing with 
repeated conditioning (first X^ = k, then Zj^ > t — Tk), but the key result that we need is 

Lemma 20 (Exercise A. 1.4(b)) The lack of memory property of the exponential dis- 
tribution holds at independent thresholds, i.e. for Z ~ Exp{\) and L a random variable 
independent of Z , the following holds: given Z > L, Z — L ^ Exp{X) and Z — L is 
conditionally independent of L. 

The proof is now completed again by the lack of memory property to see that Zq (as 
well as the Zj, j > 1) is conditionally independent oi Zq, . . . , Z^-i given X^ — k, and the 
assertion follows. □ 



n-l 



;t + s[>=A; + #<jn>A; + l:^Zj-t< 
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4.4 The strong Markov property 

The Markov property means that at whichever fixed time we inspect our process, infor- 
mation about the past is not relevant for its future behaviour. If we think of an example 
of radioactive decay, this would allow us to describe the behaviour of the emission pro- 
cess, say, after the experiment has been running for 2 hours. Alternatively, we may wish 
to reinspect after 1000 emissions. This time is random, but we can certainly carry out 
any action we wish at the time of the 1000th emission. Such times are called stopping 
times. 

We will only look at stopping times of the form 

T{„} = inf{t >Q:Xt = n} or Tc = mi{t > : Aj e C} 
for n e N or C C N (or later C C S, a countable state space). 

Proposition and Fact 21 (Strong Markov property) (i) Let X be a simple birth 
process with rates {Xn)n>o o-nd T > a stopping time. Then for all k & N with 
¥{Xt — k) > 0, we have given T < oo and Xt — k, (Ar)r<T o-nd {Xt+s)s>o o-f^ 
independent, and the conditional distribution of {Xt+s)s>o is a simple birth process 
with rates (A„)„>o starting from k. 

(a) In the special case where X is a Poisson process and¥{T < oo) = 1, {Xt+s~^t)s>o 
is a Poisson process starting from independent ofT and {Xr)r<T- 

The proof of the strong Markov property (in full generality) is beyond the scope of this 
course, but we will use the result from time to time. See the Appendix of Norris's book 
for details. Note however, that the proof the strong Markov property for first hitting 
times Tj is not so hard since then P(Ax. = i) = 1, so the only relevant statement is for 
k = i, and {Xr)r<Ti can be expressed in terms of Zj, < j < i — 1, and {Xt-^s — Xt)s>o in 
terms of Zj, j >i. In fact, this is not just a sketch of a proof, but a complete proof, if we 
make two more remarks. First, conditioning on {Xxi — i} is like not conditioning at all, 
because this event has probability 1. Second, it is really enough to establish independence 
of holding times, because "can be expressed in terms of" is actually "is a function G of" , 
where the first function e.g. goes from [0, oo)* to the space 

X = {/ : [0, t]^N, f rightcontinuous, t>0}. 

Now, we have a general result saying that if G : A — >■ X and if : B — > Y are (measureable) 
functions, and A and B are independent random variables in A and B, then G{A) and 
H{B) are also independent. To prove this using our definition of independence, just note 
that for all C X and F C Y (measurable), we have 

P(G(A) G E, H{B) e F) = P(A e G'^^E), B e H-\F)) 

= F{A e G-\E))F{B e H-\F)) 
- P(G(A) G E)F{H{B) e F). 

A more formal definition of stopping times is as follows. 
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Definition 22 (Stopping time) A random time T taking values in [0, oo] is called a stopping 
time for a continuous-time process X = {Xt)t>o if, for all i > 0, the event {T < t} can be 
expressed (in a measurable way) in terms of (Xt)^<t. 

Of course, you should think of X as an N-valued simple birth process for the moment, but 
you will appreciate that this definition makes sense in much more generality. In our next lecture 
we will see examples where we are observing several independent processes on the same time 
scale. The easiest example is two independent birth processes X = {X^^\ X^"^")) modelling e.g. 
two populations that we observe simultaneously. 

Example 23 1. Let X be a simple birth process starting from Xq = 0. Then for all i > I, 

Ti = mi{t > : Xt = i} is a stopping time since {T < t} = {3s <t:Xs=i} = {Xt > i} (the 
latter equality uses the property that birth processes do not decrease; thus, strictly speaking, 
this equality is to mean that the two events differ by sets of probability zero in the sense that 
we write E = F if F{E \F) = F{F \E) = 0). is called the first hitting time of i. Clearly, for 
X modelling a Geiger counter and i = 1000, we are in the situation of our motivating example. 

2. Let X be a simple birth process. Then for e > 0, the random time Tg = inf{Tj > Ti : 
Ti — Tj-i < e}, i.e. the first time that two births have occurred within time at most e of one 
another, is a stopping time. 

In general, the first time that something happens, or that several things have happened 
successively, is a stopping time. It is essential that we don't have to look ahead to decide. In 
particular, the last time that something happens, e.g. the last birth time before time t, is not 
a stopping time, and the statement of the strong Markov property is usually wrong for such 
times. 



4.5 The Markov property for Poisson processes 



In Proposition 3, we reformulated the Markov property of the Poisson process in terms of 
genuine independence rather than just conditional independence. Let {Xt)t>o be a Poisson 
process with rate A > 0, i.e. a (0, (An)„>o)-birth process with A„ = A for all n > 0, then 



Proof: By the version of the Markov property given in Proposition 17, we have for all ^ > 0, 
that 



also since {£ + Xs)s>o is a Poisson process starting from £. Specifically, it is not hard to see 

that for a general {k, (A„)„>o)-birth process, the process {£ + Xs)s>o is a (k + i, (A„_£)„>o-birth 
process (for any choice of A„, —i<n<0), and here A„ = A for all n. Clearly also 




:>0 ~ {Xs)s>0- 



{Xr)r<t J_[ {Xt+s)s. 
Xt=e 



:>0 ~ (^ + Xs)s>0 



:>0 ~ {Xs)s>0- 



Xt=i 
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However, conditionally given Xf = I, we have Xt-^-s — i = ^t+s — Xt for all s > 0, so that 

(Xr)r<t ]J {Xtj^s — A't)5>o ~ (Xs)s>0. 

Since the distribution of {Xi^^g — Xi)s>o (the distribution of a Poisson process of rate A starting 
from 0) does not depend on £ G N, we can now apply Exercise A. 1.5 to conclude that this condi- 
tional independence is in fact genuine independence, and the common conditional distribution 
of {XtJ^s — Xt)s^{) given = € N, is in fact also the unconditional distribution. 

More precisely, we apply the statement of the exercise with = X^, Z = (Xj+s— Xj)s>o and 
M = {Xr)r<t- Strictly speaking, Z and M are here random variables taking values in suitable 
function spaces, but this does not pose any problems from a measure-theoretic perspective, once 
"suitable" has been defined, but this definition of "suitable" is again beyond the scope of this 
course. □ 

We can now also establish our assertions in Remark 2, namely X has stationary increments 
since 

{Xt+s - Xt)s>o ~ {Xs)s>o ^ Xt+u - Xt^ Xu for all u > 0. 
The subtlety of this statement is that we have for all B C {f : [0, oo) N} (measurable) 

F{{Xt+s - Xt)s>o e 5) = n{Xs)s>o e B), 

and for all n > and B = Bu,n = {f ■ [0, oo) ^ N : f{u) = n} this gives P(Xj+„ - Xt = 
n) = P(Xu = n) as required. Independence of two increments Xt and Xj+u — Xt also follows 
directly. An careful inductive argument allows to extend this to the required independence of 
m successive increments Xt - — Xtj_-^ , j = 1, . . . , m, because 

F{Xt^+i - Xt^ = nm+i,Xt^ - Xt^_j = Um,, ... ,Xti- Xto = ni) 

= P(Xt^+i - Xt^ = nm+iMXtm - ^t^-i =n^,...,Xt,-Xt,= m) 

follows from the Markov property at t = t„i- 

An alternative way to establish Proposition 3 is to give a direct proof of the stationarity and 
independence of increments in Remark 2, based on the lack of memory property, and deduce 
the Markov property from there. 



Lecture 5 

Continuous-time Markov chains 



Reading: Norris 2.1, 2.6 
Further reading: Grimmett-Stirzaker 6.9; Ross 6.1-6.3; Norris 2.9 

In this lecture, we generalise the notion of a birth process to allow deaths and other 
transitions, actually transitions between any two states in a state space §, just as for 
discrete-time Markov chains and using the notion of a discrete-time Markov chain. 

Continuous-time Markov chains arc similar in many respects to discrete-time Markov 
chains, but they also show important differences. Roughly, we will spend Lectures 5 and 
6 to explore differences and tools to handle these, then similarities in Lectures 7 and 8. 

5.1 Definition and terminology 

Definition 24 Let (M„)„>o be a discrete-time Markov chain on S with transition prob- 
abilities TTij, i,j e §. Let {Zn)n>o be a sequence of conditionally independent expo- 
nential random variables with conditional distributions Exp{XMn) given (M„)„>o, where 
Aj e (0, oo), i e S. Then the process X = {Xt)t>o defined by 

Xt ^Mn, Tn<t< Tn+l, 71 > 0, Xt ^ OO, T^<t<00, 

where Tq = 0, T„ = Zq -|- . . . + Zn-i, n > 1, is called (minimal) continuous-time Markov 
chain with jump probabilities {'Kij)ij^s o-nd holding rates 

Usually, Too = oo, but the explosion phenomenon studied in Lecture 3 for the special 
case of a birth process has to be taken into account in a general definition. This is the 
so-called jump-chain holding-time definition of continuous-time Markov chains. There 
are others, and we will point these out when we have established relevant connections. 

Here Zn ~ Exp{\m„) given M„ is short for Z„ ~ Exp(Xk) conditionally given M„ = k, 
for all A; G S and conditional independence given (M„)„>o means that for all m > 0, 
Zq, . . . , Zjn are conditionally independent given Mq = kg, . . . , M,^ = km- 

Example 25 (Birth processes) For {k, (A„)„>o)-birth processes, we have M„ — k-\-n 
deterministic, i.e. tTj ,+i = L Conditional independence of Z„ given is independence, 
and Exp{Xm„) = Exp{Xk+n) is the unconditional distribution of Z^- 
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The representation of the distribution of X by {7rij)ij^s and {Xi)i^s is unique if we 
assume furthermore ttu G {0, 1} so that M either jumps straight away or remains in a 
given state forever, and by setting Aj = if VTjj = 1. This eliminates the possibility of the 
discrete chain proposing a "jump" from state i to itself. 

It is customary to represent the transition probabilities TTjj and the holding rates Aj 
in a single matrix, called the Q-matrix, as follows. Define for i ^ j 

Qij = Xi'Kij and qu = -Aj 

Remcirk 26 qu = — ^j^i Qij, since either '^j-^^'^ij = 1 or Aj = 0. As a consequence, the 
row sums of a Q-matrix vanish. 

X is then also referred to as a continuous-time Markov chain with Q-matrix Q, a {k, Q)- 
Markov chain if starting from Xq — Mq — k. 



Example 25 (continued) For birth processes, we obtain 



-Ao 


Ao 











-Ai 


Ai 











-A2 


A2 



V 



As with discrete-time chains it is sometimes useful to specify an initial distribution 
for Xq that we call u, i.e. we let z/j = P(Xo = i), i e S. Such a continuous-time Markov 
chain will be referred to as a {i/, (5)-Markov chain. Often 



1 i^io 
i^io 



or short 



■"to 



as a number derived from two arguments, here i and io, is called Kronecker delta, Si^ 
as a distribution only charging one point, here io, is called Dirac delta. 

As an example of how an initial distribution can arise in practice, consider the number 
of customers arriving before a shop opens at time t = 0. As this number typically varies 
from day to day, it is natural to model it by a random variable, and to specify its 
distribution. 



5.2 Construction 

Defining lots of conditional distributions for infinite families of random variables requires 

some care in a measure-theoretic context. Also outside the measure-theoretic context, 
it is conceptually easier to express complicated random objects such as continuous-time 
Markov chains, in terms of a countable family of independent random variables. We 
have already done this for Poisson processes and can therefore use independent Pois- 
son processes as building blocks. This leads to the following maze construction of a 
continuous-time Markov chain. It is the second appearance of the theory of competing 
exponentials and nicely illustrates the evolution of continuous-time Markov chains: 
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Proposition 27 Let Mq ~ u and {Nl-')t>Q, i,j G i j, independent Poisson processes 
with rates Qij . Then define To = and for n > 

r„+i = M{t > : iVf ^ iV^"^' for some j ^ M„} 

and 

Mn+i = j ^/^n+l < oo and 7V^;^^ ^ 7V^"^ 

T/ien 

Xt^Mr,, T^<t<T^+i,n>0, Xt^oo, T^<t <oo, 

is a {I'jQ) -Markov chain. 

Think of the state space S as a maze where q^j > signifies that there is a gate from 
state i G § to state j G S. Each gate (?', j) opens at the event times of a Poisson process 
N''^ . If after a given number of transitions the current state is i, then the next jump 
time of X is when the next gate leading away from i opens. If this gate leads from i to 
j, then is the new state for X. Think of each Poisson process as a clock that rings at 
its event times. A ringing clock here corresponds to a gate opening instantaneously (i.e. 
immediately closing afterwards). 

Proof: We have to check that the process defined here has the correct jump chain, holding 
times and dependence structure. Clearly Mq = Xq has the right starting distribution. 
Given Mq = i, the first jump occurs at the first time at which one of the Poisson processes 
N^^ , j 7^ i, has its first jump. This time is a minimum of independent exponential random 
variables of parameters q^j, Ti = inf {T^-^jj ^ i} for which 

>t)= F{T^' > t for all j^i)=Y[ F{Ti^ > t) = exp \ -t ^ qij I = e"^'*, 
i.e. Zq = Ti ^ Exp{\mq) given Mq. Furthermore 

For independence, the second and inductively all further holding times and transitions, we 
apply the strong Markov property of the Poisson processes (Fact 21, or a combination of 
the lack of memory property at minima of exponential variables Tl^ and at an independent 
exponential variable Ti for A^*^-^, k ^ i, as on assignment sheet 1) to see that the post-Ti 
Poisson processes (A^^^^^ — A^ri)s>o ^'^^ Poisson processes themselves, and therefore the 
previous argument completes the induction step and hence the proof. Ti is a stopping 
time since 

{Ti <t} = n{rl^' < t} 

and the latter were expressed in terms of (A^r"')r<t) j 7^ i, respectively, in Example 23. □ 
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CoroUsiry 28 (Mcirkov property) Let X be a {u, Q)-Markov chain and t > a fixed 
time. Then given Xt = k, {Xr)r<t and {Xt+s)s>o are independent, and the conditional 
distribution of {XtJ^s)s>o is that of a {k, Q)-Markov chain. 



Proof: The post-t Poisson processes (A''j:[^ — NI-')s>q are themselves Poisson processes, 
independent of the pre-i Poisson processes {N'^^)o<r<t- The post-i behaviour of X only 
depends on Xf and the post-t Poisson processes. If we condition on {X^ — k}, then 
clearly {Xt+s)s>Q is starting from k. □ 



Continuous-time Markov chains also have the strong Markov property. We leave the 
formulation to the reader. Its proof is beyond the scope of this course. 



5.3 M/M/1 and M/M/s queues 

Example 29 (M/M/1 queue) Let us model by Xt the number of customers in a 
single-server queueing system at time t > 0, including any customer currently being 
served. We assume that new customers arrive according to a Poisson process with rate 
A, and that service times are independent Exp{ii) distributed. 

Given a queue size of n, two transitions are possible. If a customer arrives (at rate 
A), X increases to n-|- 1. If the customer being served leaves (at rate yu), the X decreases 
to n — 1. If no customer is in the system, only the former can happen. This amounts to 
a Q-matrix 

/ -A A •••\ 

// — — A A 

^Ji -n-x A ■-. ■ 
V ; 

X is indeed a continuous-time Markov chain, since given state n > 1 (n = 0) and 
two (one) independent clocks Exp{\) and Exp{p) (unless n = 0) ticking, the theory of 
competing exponential clocks (Exercise A. 1.2) shows that the system starts afresh with 
the residual clock and the new clock (except n — 1 and transition to 0) exponential and 
independent of the past, and the induction proceeds. 



Example 30 (M/M/s queue) If there arc s > 1 servers in the system, the rate at 
which customers leave is s-fold, provided there are at least s customers in the system. 
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We obtain the Q-matrix 

/ -A A 
jj, —jj, — X 

2fi 





V 
















A 

A 










A 

— — A 














-s/j, — A 

S/J, 












A 

-s/i — A 











A 

—S/I — A 



which is maybe better represented by Qi^i+i ~ A for all i > 0, Qi^i-i = i/i ior 1 < i < s, 
Qi,i-i — S/I for i > s, Qii — —i/i — X for < i < s, qu — —s/i — A for i > s, and qij — 
otherwise. A slight variation of the argument for Exercise 29 shows that the M/M/s 
queue is a continuous-time Markov chain. 
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Lecture 6 

Transition semigroups 



Reading: Norris 2.8, 3.1 

Further reading: Grimmett-Stirzaker 6.8 (12)-(17), 6.9; Ross 6.4; Norris 2.7, 2.10 

In this lecture we establish transition matrices P{t), i > 0, for continuous-time Markov 
chains. This family of matrices are the analogues of n-step transition matrices = 
(^iJ'')«jeS) ''^ > 0, for discrete-time Markov chains. While we will continue to use the Q- 
matrix to specify the distribution of a continuous-time Markov chain, transition matrices 
P{t) give some of the most important probabilities related to a continuous-time Markov 
chain, but they are available explicitly only in a limited range of examples. 

6.1 The semigroup property of transition matrices 

As a consequence of the Markov property of continuous-time Markov chains, the proba- 
bilities P(Xt_|_s = j\Xt = i) do not depend on t. We denote by 

Pij{s) = F{Xt+s = j\Xt = i) and P{s) = ipij{s))ijes 
the s-step transition probabilities and s-step transition matrix. 
Example 31 For a Poisson process with rate A, we have for j > i or n > 

P-^'jit) = T-^^ or t = ^e-^* 

[j-ty. n\ 

by Remark 2. For fixed t >0 and i > 0, these are Poi{Xt) probabilities, shifted by i. 

Proposition 32 {P{t))t>o is a semigroup, i.e. for all t,s > we have P[t)P{s) — 
P(t + s) in the sense of matrix multiplication, and P{0) — I, the identity matrix. 

Proof: Just note that for all i, /c G S 

= ^P(X, = j\Xo = i)P(Xt+, = k\Xt = j,Xo = i) = J2Pijit)Pjk{s) 

ies jes 
where we applied the Markov property. □ 

We will remind ourselves of a fixed initial state Xq = ihy writing P(-|Xo = i) or Pi(-). 
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6.2 Backward equations 

The following result is useful to calculate transition probabilities. 

Proposition 33 The transition matrices {P{t))t>o of a minimal {i',Q)-Markov chain 
satisfy the backward equation 

P'(t) = QP(t) 

with initial condition -P(O) = /, the identity matrix. 

Furthermore, P{t) is the minimal nonnegative solution in the sense that all other nonnega- 
tive solutions P{t) satisfy Pik{t) > Pik{t) for all i,k EE. 

Proof: We first show that (P(t))t>o solves P'{t) = QP{t), i.e. for alH, A; G §, t > 

jes 

We start by a one-step analysis (rising the strong Markov property at the first jump time 
Ti, or directly identifying the structure of the post-Ti process) to get 

POO 

p,k{t) = ¥,{Xt = k)= P,(Xi = k\T^ = s)\e-^^'ds 

Jo 

ft 



I.e. 



Jo -I- 

Jo • / • 

Clearly this implies that pij is differentiable and we obtain 

which after cancellation of e'^'* and by Aj = —qa is what we require. 

Suppose now, we have another non- negative solution Pij{t). Then, by integration, Pij{t) also 
satisfies the above integral equations (the 6ik come from the initial conditions). Trivially 

To = => Fi{Xt = k,t < To) = < pik{t) for all i, A; G S and i > 0. 
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If for some n G N 

Vi{Xt = k,t<Tn)< Pik{t) for all A; G S and t > 0, 

then as above 



Jo ~f- 
Jo ■ / ■ 

and therefore 

p.,k{t) = lim Fi{Xt = k,t<Tn)< Pikit) 

n— »oo 

as required. We conclude that Pik{t) is the minimal non-negative solution to the backward 
equation. □ 

Note that non-minimal solutions that satisfy the conditions for transition matrices 
can only exist if '^kesPik{t) < 1 for some i G § and t > 0, i.e. the continuous-time 
Markov chain must be explosive in the sense that P(Too < oo) > 0, and then Piooit) = 

6.3 Forward equations 

Proposition 34 //§ is finite, then the transition matrices {P{t))t>o of a {v,Q)-Markov 
chain satisfy the forward equation 

P'{t) = P{t)Q 

with initial condition P{0) = I, the identity matrix. 

Proof: See Assignment question A. 3. 5. □ 

Fact 35 // S is infinite, then the statement of the proposition still holds for minimal 
{i',Q)-Markov chains. 

Furthermore, P{t) is the minimal nonnegative solution. 

The proof of the proposition can be adapted under a uniformity assumption. This 
assumption will be sufficient for most practical purposes, but the general case is best 
proved by conditioning on the last jump before t. Since this is not a stopping time, the 
Markov property does not apply and calculations have to be done by hand, which is quite 
technical, see Norris 2.8. 

In fact, both forward and backward equations admit unique solutions if the corre- 
sponding continuous-time Markov chain does not explode. This is the case in all practi- 
cally relevant situations. The non-uniqueness arises since Markovian extensions of explo- 
sive chains other than the minimal extension that we consider, will also have transition 
semigroups that satisfy the backward equations. 
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Remark 36 Transition semigroups and the Markov property can form the basis for a definition 

of continuous-time Markov chains. In order to match our definition, we could say that a {f, Q)- 
Markov chain is a process with right- continuous sample paths in S such that 

^i^t„+i = Wll^to = ■^0, ■ ■ -j^tn = in) =Pi„,i„+i(Wl - *n) 

for all < to < ti < . . . < tn+i and io, • • • , in+i & S, where P{t) satisfies the forward equations. 
See Norris 2.8. 



6.4 Example 

Example 37 For the Poisson process Qi^i+i — A, qa — —A, qij — otherwise, hence we 
have forward equations 

Puit) = -Piiit)X, i e N 

Pi,i+n(t) = Pi,i+n-lA - Pi,i+n(t)X, i G N, 71 > 1 

and it is easily seen inductively (fix i and proceed n = 0, 1, 2, . . .) that Poisson probabil- 
ities 

Pi,i+n{t) = —r^'^' 

are solutions. Writing i + n rather than j is convenient because of the stationarity of 
increments in this special case of the Poisson process. Alternatively, we may consider the 
backward equations 

p'S) = -Xpdt), i e N 

p[j (t) = Xpi+ij (t) - Xpij (t), i e N, j > i + 1 

and solve inductively (fix j and proceed i = j, j — 1, . . . ,0). We have seen an easier way 
to derive the Poisson transition probabilities in Remark 2. The hnk between the two 
ways is revealed by the passage to probability generating functions 

Gi{z,t)^Ei{z''') 

which then have to satisfy differential equations 

(9 °° 

g-Gi{z,t) = J]^^+%+nW = - m{z,t), Gi{z,0) = E,(^^°) = z\ 

n=0 

Solutions for these equations are obvious. In general, if we have Gi sufficiently smooth 
in t and z, we can derive from differential equations for probability generating functions 
differential equations for moments 

m,{t)=MXt)= ^G,{z,t) 

2=1- 



Lecture 6: Transition semigroups 



33 



that yield here 

= A Gi{z, t)\^^^_ = A, rUiiO) = Ei{Xo) = i. 

2 = 1- 

Oftcii (even in this case), this can be solved more easily than the differential equation for 
probability generating functions. Together with a similar equation for the variance, we 
can capture the two most important distributional features of a model. 



6.5 Matrix exponentials 



It is tempting to say that the differential equation P'{t) = QP{t), P{0) = I, has as its unique 
solution P{t) = e**^, that the same is true for P'{t) = P{t)Q, P{0) = I, and that the functional 
equation P{s)P{t) = P{s + t) also has as its solutions precisely P{t) = e^^ for some Q. Remem- 
ber, however that Q is a matrix, and, in general, the state space is countably infinite. Therefore, 
we have to define 6**^ in the first place, and to do this, we could use the (minimal or unique) 
solutions to the differential equations. Another, more direct, possibility is the exponential series 



n>0 



n 



where tQ is scalar multiplication, i.e. multiplication of each entry of Q by the scalar t, and 
(tQ)" is an n-fold matrix product. Let us focus on a finite state space §, so that the only 
limiting procedure is the scries over n > 0. It is natural to consider a scries of matrices as a 
matrix consisting of the series of the corresponding entries. In fact, this works in full generality, 
as long as S is finite, see Norris 2.10. 

For infinite state space, this is much harder, since every entry in a matrix product is then 
already a limiting quantity and one will either need uniform control over entries or use operator 
norms to make sense of the series of matrices. The limited benefits from such a theory are not 
worth setting up the technical apparatus in our context. 
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Lecture 7 



The class structure of 
continuous-time Markov chains 

Reading: Morris 3.2-3.5 
Further reading: Grimmett-Stirzaker 6.9 

In this lecture, we introduce for continuous-time chains the notions of irreducibihty 
and positive recurrence that will be needed for the convergence theorems in Lecture 8. 

7.1 Communicating classes and irreducibility 

We define the class structure characteristics as for discrete-time Markov chains. 

Definition 38 Let X be a continuous-time Markov chain. 

(a) We say that i e S leads to j &E> and write i — > j if 

Fi{Xt = j for some t > 0) = Fi{T{jy < oo) > 0, where T^jy = mi{t >0:Xt = j}. 

(b) We say i communicates with j and write i j ii both i — > j and j i. 

(c) We say A C S is a communicating class if it is an equivalence class for the equiva- 
lence relation ^ on S, i.e. if for all i,j & A we have i ^ j and A is maximal with 
this property (for all k & E> — A, i & A at most one of i ^ fc, — > i holds). 

(d) We say A is a closed class if there is no i e A, j e § — ^ with i ^ j, i.e. the chain 
cannot leave A. 

(e) We say that i is an absorbing state if {i} is closed. 

(f) We say that X is irreducible if S is (the only) communicating class. 

In the following we denote by M = (M„)„>o the jump chain, (.^n)n>o the holding 
times that we used in the construction of X = {Xtj^^Q. 
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Proposition 39 Let X be a minimal continuous-time Markov chain. Fori,j &E>, i j, 
the following are equivalent 

(i) j for X. 

(ii) i ^ j for M. 

n-1 

(iii) There is a sequence (io • • • , in), ij £ from io = i to in = j such that JJ^ Qij,ij+i > 0. 

(iv) Pij{t) > Q for allt > 0. 

(v) Pij{t) > for some t > 0. 

Proof: Implications (iv)=^(v)=^(i)=^(ii) are clear. 

(ii) =^(iii): Prom the discrete-time theory, we know that i ^ j for M imphes that 
there is a path (io, . . . , in) from i to j with 

n—l n—1 

n ^ik,ik+i > 0, hence T^i„ik+i>'ik > 

fe=0 k=0 

since Am = if and only if Tr^m = 1- 

(iii) ^(iv) If qij > 0, then we can get a lower bound for Pij{t) by only allowing one 
transition in [0, t] by 

PiAt) > Fi{Zo<t,Mi=j,Zi>t) 

= ¥i{Zo < t)¥i{M, = j)P(^i > t\M, = j) 
= (1 - e-^'*)7rye-^^* > 

for all t > 0, hence in general for the path (io, ■ ■ ■ ,in) given by (iii) 

Pij{t) = Fi{Xt = j) > Fi{Xkt/n = ik for all A; = 1, . . . , n) 

n-1 
k=0 

for all t > 0. For the last equality, we used the Markov property which implies that for 
all m = 1, . . . , n 

^{Xjnt/n = imlXkt/n = ife for all /c = 0, . . . , m - 1) = ¥{Xjnt/n = im\X(m-l)t/n = ^m-l) 

□ 

Condition (iv) shows that the situation is simpler than in discrete-time where it may 
be possible to reach a state, but only after a certain length of time, and then only 
periodically 
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7.2 Recurrence and transience 



Definition 40 Let X be a continuous-time Markov chain. 



(a) i e § is called recurrent if 



Pj({t > : = i} is unbounded) = 1. 



(b) i e S is called transient if 



Pi({t > : = i} is bounded) = 1. 



Note that if X can explode starting from i and if X is a minimal continuous-time 
Markov chain, then i is certainly not recurrent. 

Recall that Ni — inf{n > 1 : M„ = i} is called the first passage time of M to state i. 
We define 



the first passage time of X to state i. Note that we require the chain to do at least one 
jump. This is to force X to leave i first if Xq = i. We also define the successive passage 
times by 7V^^^ = and Nj;""^^^ = inf{n > ivf""^ :Mn^i},m> 1, for M, and 

rr(m) _ rp 

i 

m > 1, for X. 

Proposition 41 i G S recurrent (transient) for a minimal continuous-time Markov 
chain X if and only if i is recurrent (transient) for the jump chain M. 

Proof: Suppose, i is recurrent for the jump chain Af , i.e. Af visits i infinitely often, at 
steps {N^ )m>i- If we denote by l{Xo=i} the random variable tliat is 1 if = i and 
otherwise, the total amount of time that X spends at i is 



with probability 1 by the argument for Proposition 16 (convergent and divergent sums of 
independent exponential variables) since Z^^ ~ Exp{Xi) and the sum of their (identical!) 
inverse parameters is infinite. In particular {t > : Xt — i} must be unbounded with 
probability 1. 

Suppose, i is transient for the jump chain M, then there is a last step L < oo away 
from i and 



is bounded with probability 1. 

The inverse implications are now obvious since i can only be either recurrent or 
transient for M and we constructed all minimal continuous-time Markov chains from 



Hi = Tn, 



inf{t >Ti:Xt = i}, 




{t>Q:Xt^i}(l [0, Tl) 



jump chains. 



□ 
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Prom this result and the analogous properties for discrete-time Markov chains, we 
deduce 



CoroUciry 42 Every state i E E> is either recurrent or transient for X. 



Recall that a class property is a property of states that cither all states in a (commu- 
nicating) class have or all states in a (communicating) class don't have. 



CoroUciry 43 Recurrence and transience are class properties. 



Proof: If i is recurrent and i ^ j, for X, then i is recurrent and i j for M. From 
discrete-time Markov chain theory, we know that j is recurrent for M. Therefore j is 
recurrent for X. 

The proof for transience is similar. □ 



Proposition 44 For any i e S the following are equivalent: 

(i) i is recurrent for X . 
(a) = or Pi(//i < oo) = 1. 

(Hi) I pii{t)dt = oo. 
Jo 



Proof: (iii)^(ii): One can deduce this from the corresponding discrete-time result, but 
we give a direct argument here. Assume Aj > and hi = Fi{Hi = oo) > 0. Then, 
the strong Markov property at H^"'^ states that, given H^"''^ < oo, the post-H^"^^ process 
^(m+i) _ (X^{m)^j)t>o is distributed as X and independent of the prc-i/f process. Now 
the total number G of visits of X to i must have a geometric distribution with parameter 
hi since Pj(G — 1) — hi and Pi(G' = m\G > m) — hi, m > 2. Therefore, the total time 
spent in i is 

G-l 

Zj^(m) ~ Exp{hiXi), cf. Solution to Exercise A. 2. 6. 

m=0 

With notation l{Xt=i} — 1 ii Xt — i and 'i-{Xt=i} — otherwise, we obtain by ToneUi's 
theorem 



/•oo roo 

/ pii{t)dt = / Ei{l{x,=i})dt 
Jo Jo 

Ei (^j^ l{x,=i)dt^ = Ei Zj^irn) ] = ^ < oo. 



hiXi 

The other implications can be established using similar arguments. □ 
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7.3 Positive and null recurrence 



As in the discrete-time case, there is a hnk between recurrence and the existence of 
invariant distributions. More precisely, recurrence is strictly weaker. The stronger notion 
required is positive recurrence: 

Definition 45 A state i G S is called positive recurrent if either — or rrii — Ej(i7j) < 
oo. Otherwise, we call i null recurrent. 

Fact 46 Positive recurrence is a class property. 

7.4 Examples 

Example 47 The M/M/1 queue with A > and ^ > is irreducible since for all 
m > n > 0, we have qm,m-i ■ ■ ■ ?n+i,n = A*"""" > and qn,n+i ■ ■ ■ ?m-i,m = A"""" > and 
Proposition 39 yields m ^ n. 

X > /J, means that customers arrive at a higher rate than they leave. Intuitively, this 
means that Xt ^ oo (this can be shown by comparison of the jump chain with a simple 
random walk with up probability A/(A + /i) > 1/2). As a consequence, Li = sup{t > 
: = i} < oo for all i e N, and since {t > : Xf — i} C. [0, Lj], we deduce that i is 
transient. 

X < 11 means that customers arrive at a slower rate than they can leave. Intuitively, 
this means that Xf will return to zero infinitely often. The mean of the return time 
can be estimated by comparison of the jump chain with a simple random walk with up 
probability A/(A + //) < 1/2: 



where Yi, F2, • • • ~ Exp{X + /x). Therefore, is positive recurrent. Since positive recur- 
rence is a class property, all states are positive recurrent. 

For X — fj,, the same argument shows that is null-recurrent, by comparison with 
simple symmetric random walk. 

Note in each case, that the jump chain is not a simple random walk, but coincides 
with a simple random walk until it hits zero. This is enough to calculate Eo(iVo). 

Example 48 Let A > and /jl > 0. Consider a simple birth and death process with 
Q-matrix Q = (gnm)n,meN, where g„„ = -n(A + //), = nX, qn,n-i = n/i, q^m = 

otherwise. 
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• li iJ, — and A = 0, then Q = 0, all states are absorbing, so the communicating 
classes are {n}, n eN. They are all closed and positive recurrent. 

• li II — and A > 0, then is still absorbing since ?oo = 0, but otherwise n — > m if 
and only if 1 < n < m. Again, the communicating classes are {n}, n e N, {0} is 

closed and positive recurrent, but {n} is open and transient for all n > 1, since the 
process will not return after the Exp{n\) holding time. 

• If > and A = 0, then {0} is still absorbing, {n} is an open transient class. 

• If // > and A > 0, then {0} is still absorbing, {1, 2, . . .} is an open and transient 
communicating class. It can be shown that the process when starting from i > 1 
will be absorbed in {0} if A < and that it will do so with a probability in (0, 1) 
if A > 



Lecture 8 

Convergence to equilibrium 



Reading: Norris 3.5-3.8 
Further reading: Grimmett-Stirzaker 6.9; Ross 6.5-6.6 

In Lecture 7 we studied the class structure of continuous-time Markov chains. We can 
summarize the findings by saying that the state space can be decomposed into (disjoint) 
communicating classes 

§ = U u U r^, 

m>l m>l 

where the (states in) TZm are recurrent, hence closed, and the 7^ are transient, whether 
closed or not. This is the same as for discrete-time Markov chains, in fact equivalent to 
the decomposition for the associated jump chain. Furthermore, each recurrent class is 
either positive recurrent or null recurrent. 

To understand equilibrium behaviour, one should look at each recurrent class sepa- 
rately. The complete picture can then be set together from its pieces on the separate 
classes. This is relevant in some applications, but not for the majority, and not for those 
we want to focus on here. We therefore only treat the case where we have only one class 
that is recurrent. We called this case irreducible. The reason for this name is that we 
cannot further reduce the state space without changing the transition mechanisms. We 
will further focus on the positive recurrent case. 

8.1 Invariant distributions 

Note that for an initial distribution i/ on S, ~ we have 

where iyP{t) is the product of a row vector u with the matrix P{t), and we extract the 
jth component of the resulting row vector. 

Definition 49 A distribution ^ on S is called invariant for a continuous-time Markov 
chain if ^P{t) = ^ for all t > 0. 
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If we take an invariant distribution ^ as initial distribution, then Xt ^ ^ for all t > 0. 
We then say that X is in equilibrium. 



Proposition 50 IfE)is finite, then ^ is invariant if and only if CQ — 0- 
Proof: If ^P{t) — ^ for all t >0, then by the forward equation 

h—*0 fl h—*0 fl 

li = 0, we have 

^p{t) = ep(o) + e f p'{s)ds = e + r = e 

Jo Jo 

where we applied the backward equation. Here, also the integration is understood com- 
ponentwise. 

Interchanging limits/integrals and matrix multiplication is justified since § is finite. 

□ 

Fact 51 //S is infinite, Q is a Q-matrix and {P{t))t>Q are the transition matrices of the 
minimal continuous-time Markov chain associated with Q-matrix Q. Then — if and 
only if^P{t) = ^ for all t > 0. 

As a consequence, ^ can then only exist if X is non-explosive in the sense that f{Too — 
oo) = 1. 



Fact 52 An irreducible (minimal) continuous-time Markov chain is positive recurrent if 
and only if it has an invariant distribution. An invariant distribution { can then be given 
by 

miXi 

where mi — Ei(ifj) is the mean return time to i and Aj = —qu the holding rate in i. 

The proof is quite technical and does not give further intuition. The analogous result 
for discrete chains holds and gives rji = l/Ej(A^j) as invariant distribution. The further 
factor Aj occurs because a chain in stationarity is likely to be found in i if the return time 
is short and the holding time is long; both observations are refiected through the inverse 
proportionality to rui and Aj, respectively. Since this is a key result for both Convergence 
Theorem and Ergodic Theorem, the diligent reader may want to refer to Norris Theorem 
3.5.3. 
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Example 53 Consider the M/M/1 queue of Example 29. The equations — are 
given by 

-A^o + /w6 = 0, A{i_i - (A + + fi^i+i = 0, i>l. 

This system of hnear equations (for the unknowns ^i, i E N) has a probabihty mass 
function as its solution if and only if A < /i. It is given by the geometric probabilities 

By Fact 52, we can calculate Ej(i7j) = rrii = l/(Ai^j). In particular, for i = 0, we have 
the length of a full cycle beginning and ending with an empty queue. Since the initial 
empty period has average length 1/A, the busy period has length 

Eo{Ho) - 1/A ^ ^ 



X{1-X/pl) fjL-X' 
Note that this tends to infinity as A t 



8.2 Convergence to equilibrium 

The convergence theorem is of central importance in applications since is it often as- 
sumed that a system is in equilibrium. The convergence theorem is a justification for 
this assumption, since it means that a system must only be running long enough to be 
(approximately) in equilibrium. 

Theorem 54 Let X = {Xt)t>o be a (minimal) irreducible positive-recurrent continuous- 
time Markov chain, Xq ~ u, and ^ an invariant distribution, then 

F{Xt = j) as t ^ oo for all j e §. 

This result can be deduced from the convergence result for discrete-time Markov 
chains by looking at the processes Zn^ — Xnh that are easily seen to be Markov chains 
with transition matrices P{h). 

However, it is more instructive to see a (very elegant) direct argument, using the 
coupling method in continuous time. 

Sketch of proof : Let X be the continuous-time Markov chain starting according to u, Y 
an independent continuous-time Markov chain with the same Q-matrix, but starting from 
the invariant distribution ^. Choose i e S and define T — mi{t > : {Xt,Yt) = {i,i)} 
the time they first meet (in i, to simplify the argument). A third process is constructed 
Xt = Xt, t < T (following the z/-chain before T), Xt = Yt, t >T (following the ^-chain 
after T). The following three steps complete the proof: 

1. the meeting time T is finite with probability 1 (this is because rjij — ^i^j is stationary 
distribution for the bivariate process (X, Y) and existence of stationary distribution 
implies positive recurrence of X, Y (by Fact 52); 
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2. the third chain X has the same distribution as the i/-chain X; 

3. the third chain (which eventually coincides with the ^-chain Y) is asymptotically 
in equilibrium in the sense of the convergence statement in Tlieorem 54. 

□ 

Note that we obtain the uniqueness of the invariant distribution as a consequence 
since also the marginal distribution of a Markov chain starting from a second invariant 
distribution would remain invariant and converge to ^. 

Theorem 55 (Ergodic theorem) In the setting of Theorem 54, Xq u 

^ (j ^{Xs=i}d'S as t ^oo^ ^1 

Proof: A proof using renewal theory is in assignment question A. 5.4. □ 

We interpret this as follows. For any initial distribution, the long-term proportions of 
time spent in any state i approaches the invariant probability for this state. This result 
establishes a time-average analogue for the spatial average of Theorem 54. This is of great 
practical importance, since it allows us to observe the invariant distribution by looking 
at time proportions over a long period of time. If we tried to observe the stationary 
distribution using Theorem 54, we would need many independent observations of the 
same system at a large time t to estimate ^. 

8.3 Detailed balance equations and time reversal 

Proposition 56 Consider a Q-matrix Q. If the detailed balance equations 
have a solution ^ — {Ci)ies, then is a stationary distribution. 



Proof: Let ^ be such that all detailed balance equations hold. Then fix j e S and sum 
the equations over i e S to get 



since the row sums of any Q-matrix vanish (see Remark 26). Therefore = 0, as 
required. □ 
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Note that (in the case of finite #S = n), while = is a set of as many equations as 
unknowns, n, the detailed balance equations form a set of n{n — l)/2 different equations 
for n unknowns, so one would not expect solutions, in general. However, if the Q-matrix is 
sparse, i.e. contains lots of zeros, corresponding equations will be automatically satisfied, 
and these are the cases where we will successfully apply detailed balance equations. 

The class of continuous-time Markov chains for which the detailed balance equations 
have solutions can be studied further. They also arise naturally in the context of time 
reversal, a tool that may seem of little practical relevance, since our world lives forward 
in time, but sometimes it is useful to model by a random process an unknown past. 
Sometimes, one can identify a duality relationships between two different processes, both 
forward in time that reveals that the behaviour of one is the same as the behaviour of 
the time reversal of the other. This can allow to translate known results for one into 
interesting new results for the other. 

Proposition 57 Let X be an irreducible positive recurrent (minimal) continuous-time 
Markov chain with Q -matrix Q and starting from the invariant distribution ^. Let t > 
be a fixed time and Xs = ^t-s-- Then the process X is a continuous-time Markov chain 
with Q -matrix Q given by ^jQji — ^iQij- 

Proof: First note that Q has the properties of a Q-matrix in being non-negative off the 
diagonal and satisfying 

by the invariance of ^. Similarly, we define CjPji{t) = CiPij{t) and see that P{t) have the 
properties of transition matrices. In fact the transposed forward equation P'(t) = P{t)Q 
yields P'{t) = QP{t), the backward equation for P{t). Now X is a continuous-time 
Markov chain with transition probabilities P{t) since 

¥^{Xto^io,...,K-in) = ¥^{Xt-tr.^in,...,Xt-to^io) 

n 

= iin W^Pik^ik-ii^k - tk-l) 
k=l 

n 

= Cio II Pifc- 1 ,ifc (^fc - ^fc- 1 ) • 
fc=l 

From this we can deduce the Markov property. More importantly, the finite-dimensional 
distributions of X are the ones of a continuous-time Markov chain with transition matrices 
P{t). Together with the path structure. Remark 36 imphes that X is a Markov chain 
with Q-matrix Q. □ 

If Q = Q, X is called reversible. It is evident from the definition of Q that ^ then 
satisfies the detailed balance equations ^iQij = ^jQji, i,j € S. 
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8.4 Erlang's formula 

Example 58 Consider the birth-death process with birth rates Qi^i+i = Xi and death 
rates Qi^i-i = Hi, qu = —Xi — i^i, i & N, all other entries zero (and also /Xq = 0). (This 
is standard notation for this type of process, but note that Aj = we will not use 

earlier notation Xi = —qa). 

We recognise birth processes and queueing systems as special cases. 

To calculate invariant distributions, we solve = 0, i.e. 

6/^1 - CoAo = and Cn+l/^n+l - Cn(A„ + Hn) + Cn-lAn-l = 0, n>l 

or more easily the detailed balance equations 

^iXi — 6+i/Xj_|_i. 

giving 

_ A„_i . . . Ao ^ 

?n — ?0 

IIn---fJ'l 

where is determined by the normalisation requirement of ^ to be a probability mass 
function, i.e. 

6 = 4 where 5 = 1 + V "^""^ ' " ^° 
S ^ /^"•••/^i 

provided S is finite. 

If S is infinite, then there docs not exist an invariant distribution. This cannot be 
deduced from the detailed balance equations but can be argued directly by showing that 

= does not have a solution. It does not necessarily mean explosion in finite time, 
but includes all simple birth processes since they model growing populations and cannot 
be in equilibrium. By Fact 52, it means that X is then null recurrent or transient. 

On the other hand, if Aq = as in many population models, then the invariant 
distribution is concentrated in 0, i.e. = 1; = for all n > 1. 

Many special cases can be given more explicitly. E.g., if A„ = A, n > 0, /in = n/i, we 

get 

_ (A/^ V. 
nl 

You recognise the Poisson probabihties. What is this model? We can give two differ- 
ent interpretations both of which tie in with models that we have studied. First, as a 
population model, A„ = A means that arrivals occur according to a Poisson process, this 
can model immigration; /i„ = is obtained from as many Exp{fi) clocks as individuals 
in the population, i.e. independent Exp{ii) lifetimes for all individuals. Second, as a 
queueing model with arrivals according to a Poisson process, each individual leaves the 
system after an Exp{ii) time, no matter how many other people are in the system - this 
can be obtained from infinitely many servers working at rate /i. 



Lecture 9 

The Strong Law of Large Numbers 



Reading: Grimmett-Stirzaker 7.2; David Williams "Probability with Martingales" 7.2 

Further reading: Grimmett-Stirzaker 7.1, 7.3-7.5 

With the Convergence Theorem (Theorem 54) and the Ergodic Theorem (Theorem 
55) we have two very different statements of convergence of something to a stationary 
distribution. Wc arc looking at a recurrent Markov chain {Xt)t>Q^ i-e. one that visits 
every state at arbitrarily large times, so clearly Xi itself does not converge, as t — > oo. 
In this lecture, we look more closely at the different types of convergence and develop 
methods to show the so-called almost sure convergence, of which the statement of the 
Ergodic Theorem is an example. 

9.1 Modes of convergence 

Definition 59 Let n > 1, and X be random variables. Then we define 
X in probability, if for all £ > 0, P(|X„ — X\ > e) — > as n — > oo. 

X in distribution, if P(X„ < x) ^ ¥{X < x) as n — > oo, for all x e M at 
X H- >• P(X < x) is continuous. 

X in L\ if E(|X„|) < oo for all n > 1 and E(|X„ - X|) ^ as n ^ oo. 
X almost surely (a.s.), if P(X„ ^ X as n — > oo) = 1. 

Almost sure convergence is the notion that we will study in more detail here. It helps 
to consider random variables as functions X„ : — * M on a sample space Vt, or at least as 
functions of a common, typically infinite, family of independent random variables. What 
is different here from previous parts of the course (except for the Ergodic Theorem, which 
we yet have to inspect more thoroughly), is that we want to calculate probabilities that 
fundamentally depend on an infinite number of random variables. So far, we have been 
able to revert to events depending on only finitely many random variables by conditioning. 
This will not work here. 



1. X„ 



2. Xn 

which 

3. Xn — > 

4. Xn ^ 
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Let us start by recalling the definition of convergence of sequences, as n — > oo, 

X <(=^ Vm>i3„^>iV„>„^|x„ - a;| < l/m. 

If we want to consider all sequences {xn)n>i of possible values of the random variables 
{Xn)n>i, then 

rim = inf{/c > 1 : \/n>k\xn — x\ < l/m} G N U {oo} 

will vary as a function of the sequence (x„)„>i, and so it will become a random variable 

Nm = ml{k > 1 : V„>jt|X„ - X\ < l/m] e N U {oo} 

as a function of (^n)n>i- This definition of N^, permits us to write 

P(X„^X)=P(V^>i7V„<oo). 

This will occasionally help, when we are given almost sure convergence, but is not much 
use when we want to prove almost sure convergence. To prove almost sure convergence, 
we can transform as follows 

\Xn-X\ < l/m) = 1 

^ P(3„>iVAr>i3„>iv|^n - ^1 > l/m) = 0. 

We are used to events such as = {\^n ~ ^\ ^ 1/'^}) ^^'^ we understand events as 
subsets of VL, or loosely identify this event as set of all ((a;fc)fc>i, x) for which a;| > l/m. 
This is useful, because we can now translate 3m>i^ N>\^n > N into set operations and 
write 

P(Um>l nAr>i Un>NAm,n) = 0. 

This event can only have zero probability if all events nAr>i U„>Ar Am,n, m > 1, have 
zero probability (formally, this follows from the sigma-additivity of the measure P). The 
Borcl-Cantelli lemma will give a criterion for this. 

Proposition 60 The following implications hold 
Xn — > X almost surely 

Xn X in probability =^ Xn X in distribution 

Xn^X inL^ =^ E{Xn) E{X) 
No other implications hold in general. 

Proof: Most of this is Part A material. Some counterexamples are on Assignment 
5. It remains to prove that almost sure convergence implies convergence in probability. 
Suppose, Xn X almost surely, then the above considerations yield ^{ym>iNm < oo) — 
1, i.e. P(A^fc < oo) > P(V„>iA/'„ < oo) = 1 for all A; > 1. 

Now fix £ > 0. Choose m > 1 such that l/m < e. Then clearly |X„ — X\ > e > l/m 
implies N^, > n so that 

P(|X„ - X| > e) < ¥{Nm >n) ^ ¥{Nm = oo) = 0, 

as n — > oo, for any £ > 0. Therefore, Xn — > X in probability. □ 
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9.2 The first Borel-Cantelli lemma 

Let us now work on a sample space Q. It is safe to think of = x M and G O as 
f-^ = {{xn)n>i, as the set of possible outcomes for an infinite family of random variables 
(and a limiting variable). 

The Borel-Cantelli lemmas are useful to prove almost sure results. Particularly hm- 
iting results often require certain events to happen infinitely often (i.o.) or only a finite 
number of times. Logically, this can be expressed as follows. Consider events An C il, 
n > 1. Then 

uj e An i.o. <(=^ V„>i3„>„ uj e An^ <(=^ ^ n U 

n>l m>n 

Lemma 61 (Borel-Cantelli (first lemma)) Let A — f]n>i Um>n event that 

infinitely many of the events An occur. Then 

J2 P( A) < oo ¥{A) = 

n>l 

Proof: We have that A C Um>n for all n > 1, and so 
¥{A) < P ( U j < J]P(AJ ^ 

\m>n / m>n 



as n — > OO 



whenever ^n>i ^(^n) < oo. □ 

9.3 The Strong Law of Large Numbers 

Theorem 62 Let {Xn)n>i be a sequence of independent and identically distributed (iid) 
random variables with W,{Xf) < oo and E(Xi) = Then 



S 1 " 

— := — > jj, almost surely, 

n n ^-^ 



Fact 63 Theorem 62 remains valid without the assumption E(X^) < oo, just assuming 
E(|Xi|) < oo. 

The proof for the general result is hard, but under the extra moment condition 
E(X^) < oo there is a nice proof. 

Lemma 64 In the situation of Theorem 62, there is a constant K < oo such that for all 
n > 

¥.{{Sn-nnY) < Kri^. 
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Proof: Let Zk — — and Tn — Zi -\- . . . -\- Zn — Sn — n^. Then 

E(r„^) = E 1^ Z^ j = n¥.{Z^) + 3n(n - l)E(Z2z2) < Kn'^ 

by expanding the fourth power and noting that most terms vanish such as 

E(ZiZ|) = E(Zi)E(Z|) = 0. 
X was chosen appropriately, say K — 4max{E(Zj^), (E(Z^))^}. □ 
Proof of Theorem 62: By the lemma, 

^[{^-^))^^^-' 

Now, by Tonelli's theorem. 

But if a series converges, the underlying sequence converges to zero, and so 

— — /X I — > almost surely =^ — — > almost surely. 
n J n 

□ 

This proof did not use the Borel-Cantelli lemma, but we can also conclude by the 
Borel-Cantelli lemma: 

Proof of Theorem 62: We know by Markov's inequality that 

Define for 7 e (0,1/4) 

An= \-\Sn-nii\>n-A ^ J^P(A„)<oo ^ P(A) = 

J n>l 

by the first Borel-Cantelli lemma, where A = nn>i Um>n now, event happens 

if and only if 



Nyn>N 



/X 

n 



< n ' =^ > II. 



n 

□ 
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9.4 The second Borel-Cantelli lemma 



We won't need the second Borel-Cantelli lemma in this course, but include it for completeness. 



Lemma 65 (Borel-Cantelli (second lemma)) Let A = r\n>i^m>n^ event that 

infinitely many of the events An occur. Then 

IP(^n) = oo and {An)n>i independent => F{A) = 1. 

n>l 



Proof: The conclusion is equivalent to F{A'^) = 0. By de Morgan's laws 



m 



n>l m>n 

However, 



\rn>n j \m=n 

m>n m>n \ m>n 

whenever J2n>i ^i^n) = oo. Thus 

nA<^)=lun^F{ n =0. 

I m>n 



□ 



As a technical detail: to justify some of the limiting probabilities, we use "continuity of P" 
along increasing and decreasing sequences of events, that follows from the sigma-additivity of 
P, cf. Grimmett-Stirzaker, Lemma 1.3.(5). 



9.5 Examples 

Example 66 (Arrival times in Poisson process) A Poisson process has independent 
and identically distributed inter-arrival times {Zn)n>() with Zn ~ Exp{X). We denoted 
the partial sums (arrival times) by T„ = Zq + . . . + Zn-i- The Strong Law of Large 
Numbers yields 

Tn 1 

— ^ -r almost surely, as n ^ oo. 
n A 
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Example 67 (Return times of Meirkov chains) For a positive-recurrent discrete-time 
Markov chain we denoted by 

Ni = N^^^ = inf{n > : M„ = i}, N^""^^^ = mi{n > N^""^ ■.Mn = i},me N, 

the successive return times to 0. By the strong Markov property, the random variables 
j^{m+i) _ j^j-j'm) ^ ^ y I aj.Q independent and identically distributed. If we define N-^^ = 
and start from i, then this holds for m > 0. The Strong Law of Large Number yields 

— >EANi) almost surely, as m — > oo. 

m 

Similarly, in continuous time, for 

r(l) _ ^ rp . Y _ -x TT^rn) 



Hi = H\'> = inf {t >T^:Xt = i}, H^' = T^(^) , m e N, 



we get 



m 



Ej(iJj) — rrii almost surely, as m — > oo. 



Example 68 (Empirical distributions) If {Yn)n>i is an infinite sample (independent 
and identically distributed random variables) from a discrete distribution u on S, then the 
random variables Bn^ = l{Yn=i}, n > 1, are also independent and identically distributed 
for each fixed i e §, as functions of independent variables. The Strong Law of Large 
Numbers yields 

almost surely, as n — ^ oo. The probability mass function z/(") is called empirical distribu- 
tion. It lists relative frequencies in the sample and, for a specific realisation, can serve as 
an approximation of the true distribution. In applications of statistics, it is the sample 
distribution associated with a population distribution. The result that empirical distri- 
butions converge to the true distribution, is true uniformly in i and in higher generality, 
it is usually referred to as the Glivenko-Cantelli theorem. 

Remark 69 (Discrete ergodic theorem) If (M„)„>o is a positive-recurrent discrete- 
time Markov chain, the Ergodic Theorem is a statement very similar to the example of 
empirical distributions 

#{/,• = // - 1 : Mi, = i} ^ , 

> Fr^[Mo = i) = rji almost surely, as n ^ oo, 

Th 

for a stationary distribution rj, but of course, the M„, n > 0, are not independent (in 
general). Therefore, we need to work a bit harder to deduce the Ergodic Theorem from 
the Strong Law of Large Numbers. 



Lecture 10 



Renewal processes and equations 



Reading: Grimmett-Stirzaker 10.1-10.2; Ross 7.1-7.3 

10.1 Motivation and definition 

So far, the topic has been continuous-time Markov chains, and we've introduced them 
as discrete-time Markov chains with exponential holding times. In this setting we have 
a theory very much similar to the discrete-time theory, with independence of future and 
past given the present (Markov property), transition probabilities, invariant distributions, 
class structure, convergence to equilibrium, ergodic theorem, time reversal, detailed bal- 
ance etc. A few odd features can occur, mainly due to explosion. 

These parallels are due to the exponential holding times and their lack of memory 
property which is the key to the Markov property in continuous time. In practice, this 
assumption is often not reasonable. 

Example 70 Suppose that you count the changing of batteries for an electrical device. 
Given that the battery has been in use for time t, is its residual lifetime distributed as 
its total lifetime? We would assume this, if we were modelling with a Poisson process. 

We may wish to replace the exponential distribution by other distributions, e.g. one 
that cannot take arbitrarily large values or, for other applications, one that can produce 
clustering effects (many short holding times separated by significantly longer ones). We 
started the discussion of continuous-time Markov chains with birth processes as gener- 
alised Poisson processes. Similarly, we start here generahsing the Poisson process to have 
non-exponential but independent identically distributed inter-arrival times. 

Definition 71 Let (^„)„>o be a sequence of independent identically distributed positive 
random variables, T„ = X^feZo ■^ki n>l, the partial sums. Then the process X — {Xi)f>Q 
defined by 

Xt = Mn>l:Tn< t} 

is called a renewal process. The common distribution of n > 0, is called inter- arrival 
distribution. 
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Example 72 If (lt)t>o is a continuous-time Markov chain with Yq — i, then Zn — 
h\^^^^ — h\^\ the times between successive returns to i by y, are independent and 
identically distributed (by the strong Markov property) . The associated counting process 

X, = #{n > 1 : i^f ) < t} 
counting the visits to i is thus a renewal process. 



10.2 The renewal function 

Definition 73 The function t h- >• m{t) := E(JY'i) is called the renewal function. 

It plays an important role in renewal theory. Remember that for ~ Exp{X) we had 
Xt ~ Poi{\t) and in particular m(t) = E(Xt) = At. 

To calculate the renewal function for general renewal processes, we should investigate 
the distribution of Xf. Note that, as for birth processes, 

Xt^k <^=^ Tk <t < Tjfc+i, 

so that we can express 

¥{Xt = A;) = ¥{Tk < t < n+i) = F{Tk <t)- F{Tk+i < t) 

in terms of the distributions of = + . . . + Z^-i, k > 1. 

Recall that for two independent continuous random variables S and T with densities 
/ and g, the random variable S + T has density 

/CXD 
f{u-t)g{t)dt, uen, 
-oo 

the convolution (product) of / and 51, and if 5 > and T > 0, then 




f{u-t)g{t)dt, u>0. 



It is not hard to check that the convolution product is symmetric, associative and dis- 
tributes over sums of functions. While the first two of these properties translate as 
S + T = T + S and {S + T) + U = S* + (T + [/) for associated random variables, the 
third property has no such meaning, since sums of densities arc no longer probability 
densities. However, the definition of the convolution product makes sense for general 
nonnegative integrable functions, and we will meet other relevant examples soon. We 
can define convolution powers f*^^^ = / and = / * A; > 1. Then 

nn<t)^ f fT,{s)ds^ f r^^\s)ds, 

Jo Jo 
if Z„, n > 0, are continuous with density /. 
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Proposition 74 Let X be a renewal process with interarrival density f. Then m{t) 
E(Xt) is differentiable in the weak sense that it is the integral function of 

oo 

m'{s) :=^r(^H«) 
fe=i 

Lemma 75 Let X be an N-valued random variable. Then E(X) = '^j^^i V'{X > k). 
Proof: We use Tonelli's Theorem 

fe>l fe>l j>k j>l k=l j>0 



□ 



Proof of Proposition 74-' Let us integrate Yl'^Li f*^''\s) using Tonelli's Theorem 

«^ oo oo pi oo oo 

/ Er^'ns)^s = E / r^'\^)d^ = Y.^in <t) = e^i^* > ^) = mt) = m{t). 

fe=l k=l fc=l fe=l 



□ 



10.3 The renewal equation 

For continuous-time Markov chains, conditioning on the first transition time was a pow- 
erful tool. We can do this here and get what is called the renewal equation. 



Proposition 76 Let X be a renewal process with interarrival density f. Then m{t) — 
E(Xt) is the unique (locally bounded) solution of 

m{t) = F{t) + / m{t — s)f{s)ds, i.e. m = F + f * m, 
Jo 

where F{t) = /J f{s)ds = F{Zi < t). 

Proof: Conditioning on the first arrival will involve the process X^ = X^^^^, u > 0. 
Note that Xq = 1 and that X„ — 1 is a renewal process with interarrival times Z„ = Zn+i, 
n > 0, independent of Ti. Therefore 

/•oo pt ^ pt 

E(Xi)= / f{s)¥.{Xt\T^ = s)ds= / f{s)¥.{Xt_,)ds = F{s)+ / f{s)m{t - s)ds. 
Jo Jo Jo 
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For uniqueness, suppose that also i — F + f * i, then a = ^ — m is locally bounded and 
satisfies a = f*a = a*f. Iteration gives a = a* y*^^) for all A; > 1 and, summing over 
k gives for the right hand side something finite: 



\k>l J 



it) 



a * 



fe>i 



*(fc) 



it) 



f 

Jo 



a{t — s)'m'{s)ds 



{a*m') {t)\ 



< I sup \a{u)\ ) m{t) < oo 
«e[o,t] 



but the left-hand side is infinite unless a{t) — 0. Therefore i{t) — m{t), for all t >0. □ 

Example 77 We can express m as follows: m = F + F * Ylk>i f*^''^- Indeed, we check 
that i = F + F * ^;j>i Z*'-'^-* satisfies the renewal equation: 

F + f*£^F + F*f + F*J2 f*^^^ ^F + F*Y^ /*(*^) = £, 

j>2 



k>l 



just using properties of the convolution product. By Proposition 76, 



m. 



Unlike Poisson processes, general renewal processes do not have a linear renewal func- 
tion, but it will be asymptotically linear (Elementary Renewal Theorem, as we will see). 
In fact, renewal functions are in one-to-one correspondence with intcrarrival distributions 
- we do not prove this, but it should not be too surprising given that m = F + f*m is 
almost symmetric in / and m. Unlike the Poisson process, increments of general renewal 
processes are not stationary (unless we change the distribution of Zq in a clever way, 
as we will see) nor independent. Some of the important results in renewal theory are 
asymptotic results. 

These asymptotic results will, in particular, allow us to prove the Ergodic Theorem 
for Markov chains. 



10.4 Strong Law and Central Limit Theorem of re- 
newal theory 

Theorem 78 (Strong Law of renewal theory) LetX be a renewal process with mean 
interarrival time /i e (0, oo) . Then 

— ^ — > — almost surely, as t ^ oo. 
t II 

Proof: Note that X is constant on [T„,T„+i) for all n > 0, and therefore constant on 
[Tx^,Txt+i) 3 t. Therefore, as soon as X^ > 0, 

Txt ^ J_ ^ Txt+i _ Txt+i Xt + l 

Xt - Xt Xt ~ Xt^i Xt - 
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Now ^{Xf oo) = 1, since < n <(=^ T„+i = oo which is absurd, since T„+i = 
Zq + . . . + Zn is a finite sum of finite random variables. Therefore, we conclude from the 
Strong Law of Large Numbers for T„, that 

Tx 

— > fjL almost surely, as t — > oo. 

Xt 

Therefore, if — > oo and Tn/n^ ji, then 

t 

/J, < lim — < /J, as f — > oo, 
but this means ¥{Xt/t —>!///)> '^{Xt — > oo, = 1, as required. □ 

Try to do this proof for convergence in probability. The nasty e expressions are not 
very useful in this context, and the proof is very much harder. But we can now deduce 
a corresponding Weak Law of Renewal Theory, because almost sure convergence implies 
convergence in probability. 

We also have a Central Limit Theorem: 

Theorem 79 (Central Limit Theorem of Renewal Theory) Let X = {Xt)t>o be a 
renewal process whose interarrival times (i^)„>o satisfy < = Var{Yi) < oo and 
fx = E{Yi). Then 

Xf — t / fi - 

— , — > jV (0, 1) in distribution, as t ^ oo. 
The proof is not difficult and left as an exercise on Assignment 5. 



10.5 The elementary renewal theorem 

Theorem 80 Let X be a renewal process with mean interarrival times pi and m{t) — 
E(Xt). Then 

mit) mXt) 1 

= ^ - as i ^ oo 

t t fj, 

Note that this does not follow easily from the strong law of renewal theory since almost 
sure convergence does not imply convergence of means (cf. Proposition 60, see also the 
counter example on Assignment 5). In fact, the proof is longer and not examinable: we 
start with a lemma. 

Lemma 81 For a renewal process X with arrival times (r„)„>i, we have 



E(rx,+i) = n{m{t) + 1), where m{t) = E(Xj), fi = E(ri). 
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This ought to be true, because Tx^+i is the sum of Xt + 1 interarrival times, each with mean /x. 
Taking expectations, we should get m{t) + 1 times ii. However, if we condition on Xt we have 
to know the distribution of the residual interarrival time after t, but without lack of memory, 
we are stuck. 

Proof: Let us do a one-step analysis on the quantity of interest g{t) = K{Txf+i)- 



git) 



/"oo poo 

/ E{Tx,+i\Ti = s)f{s)ds= (s + E(rx,_.+i))/(s)ds+ / sfis)ds = i, + ig*f)it). 
Jo Jo Jt 

This is almost the renewal equation. In fact, gi{t) = g{t)/iJ, — 1 satisfies the renewal equation 

gi{t) = - f g{t - s)f{s)ds = [\g,{t -s) + l)f{s)ds = F{t) + (31 * im, 
Jo Jo 

and, by Proposition 76, giit) = m(i), i.e. g{t) = + m{t)) as required. □ 
Proof of Theorem 80: Clearly t < E(rxt+i) = IJ-{m{t) + 1) gives the lower bound 

liminf — ^ > -. 

t— »oo t JJ, 

For the upper bound we use a truncation argument and introduce 



Zj = Zj /\a 



Zj if Zj < a 
a if Zj > a 



with associated renewal process X. Zj < Zj for all j > implies Xt > Xt for alH > 0, hence 
m(t) > m{t). Putting things together, we get from the lemma again 

t > E(I>J = E(7>^^J - E{Z^J = Ji{m{t) + 1) - E{Z^J > Ji{m{t) + I) - a. 

Therefore 

m{t) ^1 a — Jl 
t ~ Jl Jit 

so that 

m{t) 1 
limsup — — < — 

t-(-0O t n 

Now Jl = E(Zi) = E(Zi A a) ^ E(Zi) = /x a a ^ cxd (by monotone convergence). Therefore 

mit) 1 
lim sup < — . 

t-*oo t H 

□ 

Note that truncation was necessary to get E(Z^^) < a. It would have been enough if we 
had E{Zxt ) = E(Zi) = /j,, but this is not true. Look at the Poisson process as an example. We 
know that the residual lifetime has already mean fi = 1/A, but there is also the part of Zxt 
before time t. We will explore this in Lecture 11 when we discuss residual lifetimes in renewal 
theory. 



Lecture 11 

Excess life and stationarity 



Reading: Grimmett-Stirzaker 10.3-10.4; Ross 7.7 

So far, we have studied the behaviour of one-dimensional marginals Xfi 

Xt 1 E{Xt) 1 Xt-t/i^ 



t ii' t ii' Vt^^ 



For Poisson processes we also studied finite-dimensional marginals and described the joint 
distributions of 

Xt, Xt^s ~ ^t-i ■ ■ ■ stationary, independent increments. 

In this lecture, we will make some progress with such a programme for renewal processes. 



11.1 The renewal property 



To begin with, let us study the post-t process for a renewal process X, i.e. {XtJ^g — Xt)s>o- 
For fixed this is not a renewal process in the strict sense, but for certain random t — T 
we have 

Proposition 82 Let X be a renewal process, = mf{t > '■ Xf = i} . Then {Xr)r<T^ 
and {Xt^+s — -^tJs>o o'^e independent and {Xt^+s — -^tJs>o has the same distribution as 

Proof: The proof is the same as (actually easier than) the proof of the strong Markov 
property of birth processes at Tj, cf. Exercise A. 2. 3(a). We identify the interarrival times 
Zn = Zi^n, ^ 0, independent of Zq, . . . , Zi^i. Here, as for the special case of Poisson 
processes, we describe X = (Xj^+s — XTi)s>o rather than {XTi+s)s>o since the former is 
a renewal process (in the strict sense) whereas the latter starts at i. □ 

Here are two simple examples that show why this cannot be extended to fixed times 
t = T. 
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Example 83 If the interarrival times are constant, say P(^n = 3) = 1, then X — 
{Xt+s — Xt)s>Q has a first arrival time Zq with P(^o = 3 — = 1, for < i < 3. 

The second example shows that also the independence of the pre-t and post-t processes 
fails. 

Example 84 Suppose, the interarrival times can take three values, say P(^n — 1) =0.7, 
P(Z„ = 2) = 0.2 and P(Z„ = 19) = 0.1. Note 

E(Z„) = 0.7 X 1 + 0.2 X 2 + 0.1 X 19 = 3 

Let us investigate a potential renewal property at time t = 2. Denoting X = {X^^g ~ 
Xt)s>o with holding times Z„, n > 0, we have 

(1) X2 = 2 implies Xi = 1, = Zi = 1, and we get F{Zo = j\X2 = 2) = F{Z2 = j); 

(2) ^2 = 1 and = 1 imphes = 1 and Zi > 2, we get P(Zo = 1\X2 = 1, = 1) = 
¥{Zi = 2\Zi > 2) = 0.2/0.3 = 0.6; 

(3) X2 = 1 and X^ = implies Zq = 2, we get F{Zo = 1\X2 = 1, Xi = 0) = 0.7; 

(4) = imphes Zq = 19, we get F{Zo = 17|X2 = 0) = 1. 

Prom (1) and (4) we obtain that Zq is not independent of X2 and hence X depends on 

{Xr)r<2- ^ 

From (2) and (3) we obtain that Zq is not conditionally independent of Xi given 
X2 = 1, so X depends on {Xr)r<2 even conditionally given X2 — 1. 

In general, the situation is more involved, but you can imagine that knowledge of 
the age At — t — Txt at time gives you information about the excess lifetime (residual 
lifetime) Et = Tx^+i — t which is the first arrival time Zq of the post-t process. In 
particular, At and Et are not independent, and also the distribution of Et depends on t. 
Note that if the distribution of Et did not depend on At, we would get (something very 
close to, not exactly, since At is random) the lack of memory property and expect that 
Zj is exponential. 

However, this is all about Zq, we have not met any problems for the following inter- 
arrival times. Therefore, we define "delayed" renewal processes, where the first renewal 
time is different from the other inter-renewal times. In other words, the typical renewal 
behaviour is delayed until the first renewal time. Our main interpretation will be that 
Zq is just part of an inter-renewal time, but we define a more general concept within 
which we develop the idea of partial renewal times, which will show some effects that 
seem paradoxial at first sight. 

Definition 85 Let {Zn)n>i be a family of independent and identically distributed inter- 
arrival times and Zq an independent interarrival time with possibly different distribution. 
Then the associated counting process X with interarrival times {Zn)n>o is called a delayed 
renewal process. 
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As a corollary to Proposition 82, we obtain. 

Corolleiry 86 (Renewal property) Proposition 82 also remains true for delayed re- 
newal processes. In this case, the post-Ti process will be undelayed. 

Just as the Markov property holds at more general stopping times, the renewal prop- 
erty also holds at more general stopping times, provided that they take values in the 
set {T„, n > 0} of renewal times. Here is a heuristic argument that can be made pre- 
cise: we apply Proposition 82 conditionally onT — and get conditional independence 
and the conditional distribution of the post-T process given T — We see that the 
conditional distribution of the post-T process does not depend on i and apply Exercise 
A. 1.5 to deduce the unconditional distribution of the post-T process and unconditional 
independence. 

In particular, the renewal property does not apply to fixed T = t, but it does apply, 
e.g., to the first arrival (renewal) after time t, which is Tx^+i- Actually, this special case 
of the renewal property can be proved directly by conditioning on Xf. 

Proposition 87 Given a (possibly delayed) renewal process X, for every t >0, the post-t 
process Z — (Xt+s — Xt}s>o is a delayed renewal process with Zq — Ef. 

Renicirk 88 Note that we make no statement about the dependence on the pre-t process. 

Proof: We apply the renewal property to the renewal time Tx^+i, the first renewal time 
after t. This establishes that (Z„)„>i are independent identically distributed interarrival 
times, independent from the past, in particular from Zq = Ef. X therefore has the 
structure of a delayed renewal process. □ 

11.2 Size-biased picks and stationarity 

An important step towards further limit results will be to choose a good initial delay 
distribution. We cannot achieve independence of increments, but we will show that we 
can achieve stationarity of increments. 

Proposition 89 Let X be a delayed renewal process. If the distribution of the excesses 
Et does not depend ont>0, then X has stationary increments. 

Proof: By Proposition 87, the post-i process X — {Xt+s — Xt)s>o is a delayed renewal 
process with Zq = Et. If the delay distribution does not depend on t, then the distribution 
of X does not depend on t. In particular, the distribution of Xg — Xf^g—Xf, an increment 
of X, does not depend on t. □ 
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In fact, the converse is true, as well. Also, it can be shown that £■ is a Markov 
process on the uncountable state space § = [0, oo), and we are looking for its invariant 
distribution. 

Since we have not identified the invariant distribution, we should look at for large 
t for a non-delayed renewal process since equilibrium (for Markov chains at least) is 
established asymptotically. We may either appeal to the Convergence Theorem and look 
at the distribution of Ef, say F{Et > x) for large t, or we may appeal to the Ergodic 
Theorem and look at the proportion of s e [0, t] for which Es > x. 

Example 90 Let us continue the discussion of Example 84 and look at Et for large t. 
First we notice that the fractional parts {Et\ have a deterministic sawtooth behaviour 
forever since n > 0, only take integer values. So, look at En for large n. 1. What is 
the probability that n falls into a long interarrival time? Or 2. What proportion of ns 
fall into a long interarrival time? 10%? 

In fact. En is a discrete-time Markov chain on {1, 2, . . . , 19} with transition probabil- 
ities 

Tli+l^i = 1, i > 1, TTij = P(Zi = j), 

which is clearly irreducible and positive recurrent if Ei(iJi) = E(Zi) < oo, so it has a 
unique stationary distribution. This stationary distribution can be calculated in a general 
setting of N- valued interarrival times, see Assignment 6. Let us here work on the intuition: 
The answer to the first question is essentially the probabihty of being in a state 3, 4, . . . , 19 
under the stationary distribution (Convergence Theorem for P(£'„ = j)), whereas the 
answer to the second question is essentially the probability of being in a state 3, 4, . . . , 19 
under the stationary distribution (Ergodic Theorem for n^^^{k = 0, . . . , n — 1 : Ek = j}). 

In a typical period of 30 time units, we will have 10 arrivals, of which two will be 
separated by one long interarrival time. More precisely, on average 11 ns out of 30 will 
fall into small interarrival times, and 19 will fall into the long interarrival time. This is 
called a size-biased pick since 19 is the size of the long interarrival time. 1 and 2 are 
the sizes of the short interarrival times, this is weighted with their seven resp. two times 
higher proportion of occurence. 

Furthermore, of these 19 that fall into the long interarrival time, one each has an 
excess of A; for A; = 1, ... , 19, i.e. we may expect the split of the current interarrival time 
into age and excess to be uniformly distributed over the interarrival time. 

Definition 91 With a probability mass function p on N with fj, — ^„>o npn < oo we 
associate the size-biased pick distribution p^'' as 

Pn = ' n>l. 

A* 

With a probability density function / on [0, oo) with /x = tf{t)dt < oo, we associate 
the size-biased pick distribution 
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Example 90 is for an N-valued interarrival distribution, where the renewal process 
{Xt)t>o will only see renewals at integer times, so it makes sense to consider the discretised 
process (X„)„>o as well as a discretised process {En)n>o of excesses at integer times. The 
story with continuous interarrival times is completely analogous but does not tie in with 
the theory of discrete-time Markov chains, rather with [0, oo)-valued Markov processes. 
We can still calculate 



J-k Jq J-k 



j=0 

using the Strong Law of Large Numbers on (Zj)j>o and Yj = {Zj — y)l^Zj>y}, j > 0, to 
obtain 

~ - ' {z-y)f{z)dz 



as (survival function P(Zo > y) of the) proposed stationary distribution for [Ei)i^q. 

It is not hard to show that if L has the size-biased pick distribution and U is uniform 
on [0, 1] then LU has density 

/o(y) = -F{y). 
1^ 



Just check 



V{UJ>x) = / '^{L> xlu\U ^u)du^ \ / -f{z)dzdu^ / / du-f{z)dz 

Jo Jo Jx/u A* Jx Jx/z A* 

1 /"^ 

- / {z-x)f{z)dz 

Jx 



and differentiate to get 

fo{x) = -^F{LU >x)^ -xf{x) + -F{x) - -xf{x) = -F{x). 

dX fjL IJ, /J, IjL 

If there is uniqueness of stationary distribution, Ergodic Theorem, etc. for the Markov 
process {Et)t>o, we obtain the following result. 

Proposition 92 Let X be a delayed renewal process where P(Z„ > y) = F{y), n > 1, 
and P(^o > y) — Fo{y). Then X has stationary increments, and Et ~ LU where L ~ Fgh 
and U ~ C/(0, 1) independent, for all t > 0. 

Proof: A proof within the framework of this course is on Assignment 6. □ 

Example 93 Clearly, for the Exp{X) distribution as inter-arrival distribution, we get 
fo{x) — Xe~^^ also. Note, however, that 

fsb^^^ ^ ^^JM ^ X^xe-^\ X > 0, 

is the probability density function of a Gamma{2, A) distribution, the distribution of the 
sum of two independent Exp{\) random variables. 
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Lecture 12 



Convergence to equilibrium — 
renewal theorems 

Reading: Grimmett-Stirzaker 10. 4 

Equilibrium for renewal processes X (increasing processes!) cannot be understood 
in the same way as for Markov chains. We rather want increment processes Xf^g ~ ^t, 
t >0, to form a stationary processes for all fixed s. Note that we look at Xf^g — Xf as a. 
process in t, so we are looking at all increments over a fixed interval length. This is not 
the post-t process that takes Xt+s — as a process in s. 

12.1 Convergence to equilibrium 

We will have to treat the discrete and continuous cases separately in most of the sequel. 
More precisely, periodicity is an issue for discrete interarrival times. The general notion 
needed is that of arithmetic and non-arithmetic distributions: 

Definition 94 A nonnegative random variable Z (and its distribution) are called d-arith- 
metic if ¥{Z e dN) — 1, and d is maximal with this property. If Z is not d-arithmetic 
for any d > 0, it is called non-arithmetic. 

We think of Z as an interarrival time. All continuous distributions are non-arithmetic, 
but there are others, which will not be relevant for us. We will mostly focus on this case. 
Our second focus is the 1-arithmetic case, i.e. the integer-valued case, where additionally 
¥(Z e dN) < 1 for all d > 2. It is not difficult to formulate corresponding results for d- 
arithmetic interarrival distributions and also for non- arithmetic ones, once an appropriate 
definition of a size-biased distribution is found. 

Fact 95 (Convergence in distribution) LetX be a (possibly delayed) renewal process 
having interarrival times with finite mean /i. 
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(a) If the interarrival distribution is continuous, then 

Xt^s — Xt ^ Xg in distribution, as t ^ oo, 
where X is an associated stationary renewal process. 

Also {At. Et) — U),LU) in distribution, as t — > oo where L has the size- 

biased density fsb and U ~ Unif{0, 1) is independent of L. 

(b) If the interarrival distribution is integer-valued and 1-arithmetic, then 

XnJ^s — Xn — > Xg in distribution, as n —>■ oo. 

where X is an associated delayed renewal process, not with continuous first arrival 
LU, but with a discrete version: let L have size-biased probability mass function 
Psb and Zq conditionally uniform on {1, . . . , L}. Also, {A^, E^) —>■ {L — Zq, Zq) in 
distribution, as n ^ oo. 

We will give a sketch of the couphng proof for the arithmetic case below. 



12.2 Renewal theorems 

The renewal theorems are now extensions of Fact 95 to convergence of certain moments. 
The renewal theorem itself concerns means. It is a refinement of the Elementary Renewal 
Theorem to increments, i.e. a second-order result. 

Fact 96 (Renewal theorem) Let X be a (possibly delayed) renewal process having in- 
terarrival times with finite mean fi and renewal function m{t) = K{Xt). 

(a) If the interarrival times are non- arithmetic, then for all h > 

m{t -\- h) — m(t) — >• — as t ^ oo. 
A* 

(b) If the interarrival times are 1-arithmetic, then for all h & N 

m{t -\- h) — m(t) — >• — as t ^ oo. 
A* 

As a generalisation that is often useful in applications, we mention a special case of 
the key renewal theorem: 

Fact 97 (Key renewal theorem) Let X be a renewal process with continuous inter- 
arrival time distribution and m(t) = E(A't). If g : [0, oo) [0, oo) is integrable (over 
[0, oo) in the limiting sense) and non-increasing, then 

1 f°° 

g{t — x)m'{x)dx — g{x)dx as t ^ oo. 
A* Jo 



{g*m'){t)^ [ 
Jo 
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There are generahsations that allow bigger classes of g (directly Riemann-integrable 

functions) and put no restrictions on the interarrival distribution (at the cost of some 
limits through discrete lattices in the d-arithmetic case). Even an infinite mean can be 
shown to correspond to zero limits. 

Note that for gh{x) = l[o,h]{x), we get 



and this leads to the renewal theorem. The renewal theorem and the key renewal theorem 
should be thought of as results where time windows are sent to infinity, and a stationary 
picture is obtained in the limit. In the case of the renewal theorem, we are only looking at 
the mean of an increment. In the key renewal theorem, we can consider other quantities 
related to the mean behaviour in a window. E.g., moments of excess lifetimes fall into 
this category. Note the general scheme that e.g. {Et < /i} is a quantity only depending 
on a time window of size h. 

Vice versa, the key renewal theorem can be deduced from the renewal theorem by 
approximating g by step functions. 



Suppose now that X is a renewal process with integer-valued interarrival times, and 
suppose that W'{Zi e dN) < 1 for all d > 2, i.e. suppose that Zi is 1-arithmetic. Let 
X be a renewal process that is not itself stationary, but that is such that (^n)n>o has 
stationary increments and such that X has also a 1-arithmetic first arrival time. This can 
be achieved by choosing L with the size-biased pick distribution p**, which is 1-arithmetic, 
and choosing Zq conditionally uniform on {1, . . . , L}. 

We want to couple these two independent processes X and X. Define = inf{n > 
1 : Tn = Tn}, the first index of simultaneous arrivals. Note that we require not only that 
arrivals happen at the same time, but that the index is the same. We can show that 
P(A'" < oo) = 1 by invoking a result on centered random walks. In fact N is the first 
time, the random walk Sn = Tn — Tn hits 0. S is not a simple random walk, but it can 
be shown that it is recurrent (hence hits in finite time) since E(Zi — Zi) — 0. 

Now the idea is similar to the one in the coupling proof of the convergence theorem 
for Markov chains. We construct a new process 




12.3 The coupling proofs 



z. 



z 



n 



iin<N 



n 



z. 



iin> N 



n 



SO that 



X 



X 



t 



iit<Tj 
iit>T] 



N 



N 
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Now X X and we get, in fact 
P((Xt+, - Xt)s>o eA) = ¥{Tn < t, {Xt+s - Xt)s>o eA)+ P(T^^ > t, (Xt+s - Xt)s>o e A) 

^ P((Xi+3-^t).>oe^), 

for all suitable A G {f : [0, oo) N}, since F{Tn > t) ^ 0. 

This shows convergence of the distribution of {Xt+s — ^t)s>o to the distribution of X 
in so-called total variation, which is actually stronger than convergence in distribution 
as we claim. In particular, we can now take sets A — {f : [0, oo) — > N : f{s) = k} to 
conclude. 

For the renewal theorem, we deduce from this stronger form of convergence, that the 
means of Xt+s — Xt converge as t — > oo. 

For the non- arithmetic case, the proof is harder, since N = oo, and times — 
inf{n > 1 : |T„ — T„| < £:} do not achieve a perfect coupling. 

There is also a proof using renewal equations. 



12.4 Example 

Let us investigate the asymptotics of E(£J[). We condition on the last arrival time before 
t and that it is the kth arrival 

E{Ei) = ¥.{{{Zo-tyr) + Y,fm{Zk-{t-x))+r)f<'\x)dx 

k>i 

= E(((Zo-t)+)')+ / E{{{Z,-{t-x))+Y)m'ix)dx. 

Jo 

Let us write h{y) — E(((Zi — y)'^Y). This is a clearly a nonnegative nonincreasing 
function of y and can be seen to be integrable if and only if K{Zl'^^) < oo (see below). 
The Key Renewal Theorem gives 



E{El)^- rh{y)dy 
A* Jo 



E(Z[+^) 



smce 



and hence 



poo poo 

E(((Zi-x)+r)- / {z-xrf{z)dz^ ff{y + x)dy 

Jx Jo 

roo poo roo 

/ E{{{Z,-x)+Y)dx ^ / y^ f{y + x)dxdy 
Jo Jo Jo 







It is now easy to check that, in fact, these are the moments of the limit distribution LU 
for the excess life Et. 



Lecture 13 

M/M/1 queues and queueing 
networks 

Reading: Norris 5.2.1-5.2.6; Grimmett-Stirzaker 11.2, 11.7; Ross 6.6, 8.4 

Consider a single-server queueing system in which customers arrive according to a 
Poisson process of rate A and service times are independent Exp{fi). Let Xt denote the 
length of the queue at time t including any customer that is currently served. This is the 
setting of Exercise A. 4. 2 and from there we recall that 

• An invariant distribution exists if and only if A < /i, and is given by 

= (A//x)"(l - A//X) = -p), n> 0. 

where p — X/ p, is called the traffic intensity. Clcary \ < p <^=^ p < 1. By the 
ergodic theorem, the server is busy a (long-term) proportion p of the time. 

• ^„ can be best obtained by solving the detailed balance equations. By Proposition 
57, X is reversible in equilibrium. 

• The embedded "jump chain" (M„)„>o, M„ = Xt„, has a different invariant distri- 
bution f] ^ C since the holding times are Exp{\ + p) everywhere except in 0, where 
they are Exp{X), hence rather longer, so that X spends "more time" in than M. 
Hence rj puts higher weight on 0, again by the ergodic theorem, now in discrete 
time. Let us state more explicitly the two ergodic theorems. They assert that we 
can obtain the invariant distributions as almost sure limits as n — > 00 




for alH > 0, actually in the first case more generally, as t ^ 00 where t replaces 
the special choice t = Tn. Note how the holding times change the proportions as 
weights in the sums, Tn — Zq + . . . + Z^-i being just the sum of the weights. 

• During any Exp{p) service time, a geom{X/ {X + p)) number of customers arrives. 
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13.1 M/M/1 queues and the departure process 

Define Dq = and successive departure times 

= M{t >D^:Xt- Xt_ = -1} n> 0. 

Let us study tlie process = ^ > 0, i.e. the process of queue lengtlis after depar- 
tures. By the lack of memory property of Exp{X), the geometric random variables N^, 
n > 1, that record the number of new customers between Dn-i and Dn, are independent. 
Therefore, (Ki)n>o is a Markov chain, with transition probabilities 

dk,k-i+m = ( T— — ) T— — , A; > 1, m > 0. 

For k = 0, we get (io,m = di,m, m > 0, since the next service only begins when a new 
customer enters the system. 

Proposition 98 V has invariant distribution ^. 

Proof: A simple calculation shows that with p — X/ /i and q — X/{X + /i) 

n+l n+1 y \ k 

J2 = Corfo.n + = (1 - - g) + (1 - p)(l - Y.[-) = 

fe€N fe=l k=\ 

after bringing the partial geometric progression into closed form and appropriate cancel- 
lations. □ 

Note that the conditional distribution of I^n+i — -Dn given Vn = k is the distribution 
of a typical service time G ~ Exp^i) if A; > 1 and the distribution of y -|- G, where 
Y ~ Exp{X) is a typical interarrival time, if A; = since we have to wait for a new customer 
and his service. Wc can also calculate the unconditional distribution of Dn+i — at 
least if V is in equilibrium. 

Proposition 99 If X (and hence V) is in equilibrium, then the Dn+i — Dn are indepen- 
dent Exp{X) distributed. 

Proof: Let us first study D^. We can calculate its moment generating function by 
Proposition 7 a), conditioning on Vq, which has the stationary distribution ^: 

oo 

E(eT-°i) = E(eT-°i|yo = O^Vq = 0) + ^^^(^'^^'1^0 = k)¥{Vo = A;) 

k=l 



A — 7 /i — 7 \ /i / jj, — 'y /J, 

X /X — A + A — 7 A 

11 — ^ A — 7 A — 7 

and identify the Exp{X) distribution. 
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For independence of Vi and Di we have to extend the above calculation and check 
that 



A — 7 // — ckA' 



because the second ratio is the probability generating function of the geom{\/ ii) station- 
ary distribution ^. To do this, condition on Vq ~ ^ and then on Di. 

oo 
k=0 

and use the fact that given Vi = k > l,Vi = k+Ni — 1, where A''i ~ Poi{\x) conditionally 
given Di = x, because A^i is counting Poisson arrivals in an interval of length Di — x: 

/•oo 

E{e'^^'a^'\Vo = A;) = a'''^ / E{e^^^a^'\Vo ^k,Di^ x)fD,{x)dx 

Jo 

POO 

= a''~^ e^"" exp{-Xx{l - a)}fDi{x)dx 
Jo 

^ ' ii--f + X{l-a) 

For A; = 0, we get the same expression without a''^^ and with a factor A/(A — 7), 
because Di — Y + G, where no arrivals occur during Y, and A^i is counting those during 
G ~ Exp{iJi). Putting things together, we get 

A P \ P 



A — 7 1 — pay/i. — 7 + A(l — a) ' 

which simplifies to the expression claimed. 

Now an induction shows Dn+i — Dn ~ Exp{X), and they are independent, because the 
strong Markov property at Dn makes the system start afresh conditionally independently 
of the past given Since Di, . . . , Dn — Dn-i are independent of Vn, they are then also 
independent of the whole post-Z^j^ process. □ 



The argument is very subtle, because the post-D„ process is actually not independent 
of the whole pre-D„ process, just of the departure times. The result, however, is not 
surprising since we know that X is reversible, and the departure times of X are the 
arrival times of the time-reversed process, which form a Poisson process of rate A. 

In the same way, we can study = and successive arrival times 

An+i = mi{t >An:Xt- Xt- = 1}, n> 0. 

Clearly, these also have Exp{X) increments, since the arrival process is a Poisson process 
with rate A. We study Xa^ in the next lecture in a more general setting. 



72 



Lecture Notes - Part B Applied Probability - Oxford MT 2007 



13.2 Tandem queues 

The simplest non-trivial network of queues is a so-called tandem system that consists 
of two queues with one server each, having independent Exp{fj.i) and Exp{ii2) service 
times, respectively. Customers join the first queue according to a Poisson process of rate 
A, and on completing service immediately enter the second queue. Denote by x[^^ the 
length of the first queue at time t and by the length of the second queue at time t. 

Proposition 100 The queue length process X — {X^^\ X^'^^) is a continuous-time Markov 
chain with state space E> — N^ and non-zero transition rates 

Proof: Just note that in state {i + 1, j + 1), three exponential clocks are ticking, that 
lead to transitions at rates as described. Similarly, there are fewer clocks for (0,j + 1), 
(i-l- 1, 0) and (0, 0) since one or both servers are idle. The lack of memory property makes 
the process start afresh after each transition. Standard reasoning completes the proof. 

□ 

Proposition 99 yields that the departure process of the first queue, which is now also 
the arrival process of the second queue, is a Poisson process with rate A, provided that 
the queue is in equilibrium. This can be achieved if A < /Xi. 

Proposition 101 X is positive recurrent if and only if pi :— X/ /ii < 1 and p2 '■— A///2 < 
1. The unique stationary distribution is then given by 

^(m) = pUi - pi)pl(i - P2) 

i. e. in equilibrium, the lengths of the two queues at any fixed time are independent. 

Proof: As shown in Exercise A. 4. 3, pi > 1 would prevent equilibrium for X^^\ and 
expected return times for X and X^^^ then clearly satisfy m(o,o) > '"^■0^'' = 00. If pi < 1 
and X^^^ is in equilibrium, then by Proposition 99, the arrival process for the second queue 
is a Poisson process at rate A, and P2 > 1 would prevent equilibrium for X^'^\ Specifically, 
if we assume mo,o < 00, then we get the contradiction cxo = mg^^ < m(o,o) < 00. 

If Pi < 1 and P2 < 1, ^ as given in the statement of the proposition is an invariant 
distribution, it is easily checked that the (i -\- l,j -\- 1) entry of = holds: 

+ ^{i+2,j)q{i+2,j),(i+l,j+l) + C(i+lj+2)?(i+l,j+2),(j+lj+l) 

+ C{i+l,j+l)Q{i+l,j+l),{i+l,j+l) = 

for i,j e N, and similar equations for states (0, j + 1), 1,0) and (0, 0). It is unique 
since X is clearly irreducible (we can find paths between any two states in N^). □ 

We stressed that queue lengths are independent at fixed times. In fact, they are not 
independent in a stronger sense, e.g. (xi^\x^^^) and (X^'^\ X^"^^) ior s < t turn out to 
be dependent. More specifically, consider Xg^^ — X^^^ = n for big n, then it is easy to 
see that < ¥{X^'^^ = 0|xi^^ — X^^^ = n) — > as n — > 00, since at least n customers will 
then have been served by server 2 also. 
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13.3 Closed and open migration networks 

More general queueing systems are obtained by allowing customers to move in a system of 
m single-server queues according to a Markov chain on {1, ... , m}. For a single customer, 
no queues ever occur, since he is simply served where he goes. If there are r customers in 
the system with no new customers arriving or existing customers departing, the system 
is called a closed migration network. If at some (or all) queues, also new customers arrive 
according to a Poisson process, and at some (or all) queues, customers served may leave 
the system, the system is called an open migration network. 

The tandem queue is an open migration network with m — 2, where new customers 
only arrive at the first queue and existing customers only leave the system after service 
from the second server. The Markov chain is deterministic and sends each customer 
from state 1 to state 2: 7ri2 — 1. Customers then go into an absorbing exit state 0, say, 
7r2,o = 1, T^co = 1- 

Fact 102 If service times are independent Exp{^k) cit server k e {!,..., m}, arrivals 
occur according to independent Poisson processes of rates X^, k = 1, . . . ,m, and depar- 
tures are modelled by transitions to another server or an additional state 0, according 
to transition probabilities Tik/, then the queue-lengths process X — {X^^\ . . . , X^"^^) is 
well-defined and a continuous-time Markov chain. Its transition rates can be given as 

Qx,x+ek — Qx,x—ek+ei — l^k'^kii Qx,x—ek l^k'^kO 

for all k,i e {1, ■ ■ ■ jTTi}, X = {xi, . . . ,Xm) £ N"* such that x^ > 1 for the latter two, 
efe = (0, . . . , 0, 1, 0, . . . , 0) is the kth unit vector. 

Fact 103 If X — {X^^\ . . . , X^"^^) models a closed migration network with irreducible 
migration chain, then the total number of customers X^^^ + . . . + X^^^ remains constant 
over time, and for any such constant r, say, X has a unique invariant distribution given 
by 

m 

— Br Y\ ^k'l /'^'^ ^ ^ ^''^^^ ^^^^ Xi -\- . . . -\- Xm — r, 

k=l 

where rj is the invariant distribution of the continuous-time migration chain and is a 
normalising constant. 

Note that ^ has a product form, but the queue lengths at servers k — 1, ...,m 
under the stationary distribution are not independent, since the admissible x-values are 
constrained by xi + . . . + Xm — t. 
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Lecture 



14 



M/G/1 



and G/M/1 queues 



Reading: Norris 5.2.7-5.2.8; Grimmett-Stirzaker 11.1; 11.3-11.4; Ross 8.5, 8.7 

Further reading: Grimmett-Stirzaker 11.5-11.6 

The M/M/1 queue is the simplest queueing model. We have seen how it can be ap- 
plied/modified in queueing networks, with several servers etc. These were all continuous- 
time Markov chains. It was always the exponential distribution that described interarrival 
times as well as service times. In practice, this assumption is often unrealistic. If we keep 
exponential distributions for either interarrival times or service times, but allow more gen- 
eral distributions for the other, the model can still be handled using Markov techniques 
that we have developed. 

We call M/G/1 queue a queue with Markovian arrivals (Poisson process of rate A), a 
General service time distribution (we also use G for a random variable with this general 
distribution on (0, oo)), and 1 server. 

We call G/M/1 queue a queue with a General interarrival distribution and Markovian 
service times (exponential with rate parameter /x), and 1 server. 

There are other queues that have names in this formalism. We have seen M/M/s 
queues (Example 30), and also M/M/oo (queues with an infinite number of servers) - 
this model is the same as the immigration-death model that we formulated at the end of 
Example 58. 



An M/G/1 queue has independent and identically distributed service times with any 
distributions on (0, oo), but independent Exp{X) interarrival times. Let be the queue 
length at time t. X is not a continuous-time Markov chain, since the service distribution 
does not have the lack of memory property (unless it is exponential which brings us back 
to M/M/1). This means that after an arrival, we have a nasty residual service distribu- 
tion. However, after departures, we have exponential residual interarrival distributions: 



14.1 M/G/1 queues 
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Proposition 104 The process of queue lengths Vn — ^d„ cit successive departure times 
Dji, n > 0, is a Markov chain with transition probabilities 



and (io,m = <^i,m; m > 0. Here G is a (generic) service time. 

Proof: The proof is not hard since we recognise the ingredients. Given G — t the 
number N of arrivals during the service times has a Poisson distribution with parameter 
\t. Therefore, if G has density g 



If G is discrete, a similar argument works. The rest of the proof is the same as for M/M/1 
queues (cf. the discussion before Proposition 98). In particular, when the departing 
customer leaves an empty system behind, there has to be an arrival, before the next 



For the M/M/1 queue, we defined the traffic intensity p — \/ix, in terms of the arrival 
rate A = 1/E(y) and the (potential) service rate ji — 1/E(G) for a generic interarrival 
time Y ~ Exp{X) and service time G ~ Exp{fj,). We say "potential" service rate, because 
in the queueing system, the server may have idle periods (empty system), during which 
there is no service. Indeed, a main reason to consider traffic intensities is to describe 
whether there are idle periods, i.e. whether the queue length is a recurrent process. 

If G is not exponential, we can interpret "service rate" as asymptotic rate, consider a 
renewal process with interrenewal times distributed as G. By the strong law of renewal 
theory Nt/t — * 1/E{G). It is therefore natural, for the M/G/1 queue, to define the traffic 
intensity as p = AE(G). 

Proposition 105 Let p = AE(G) be the traffic intensity of an M/G/1 queue. If p < 1, 
then V has a unique invariant distribution ^. This ^ has probability generating function 






service time starts. 



□ 




Proof: We define ^ via its probability generating function 



4>{s) = Y: ^ks' (1 - p)(l - s) ^_ 1^ 



Lecture 14: M/G/1 and G/M/1 queues 



77 



and note that — 0(0) — 1 — p. To identify ^ as solution of 

i+i 

i=0 

we can check the corresponding equahty of probabihty generating functions. The prob- 
abihty generating function of the left-hand side is To calculate the probability 

generating function of the right-hand side, calculate first 



ml 



and then we have to check that the following sum is equal to (f){s): 

i+i 

E E = E ^odojs^ + E E ik+idk+i,k+ms''^"" 

jgN 1=0 jeN feeNmeN 

/ oo 

= E(e(-^)^«)Uo + E^^+i^' 
\ km 

= E(e(-i)^^KM0(^)-(l-p)(l-s)), 

but this follows using the definition of (f){s). This completes the proof since uniqueness 
follows from the irreducibility of y. □ 



14.2 Waiting times in M/G/1 queues 

An important quantity in queueing theory is the waiting time of a customer. Here we have 
to be specific about the service discipline. We will assume throughout that customers 
queue and arc served in their order of arrival. This discipline is called FIFO (First In 
First Out). Other disciplines like LIFO (Last In First Out) with or without interruption 
of current service can also be studied. 

Clearly, under the FIFO discipline, the waiting time of a given customer depends on 
the service times of customers in the queue when he arrives. Similarly, all customers in 
the system when a given customer leaves, have arrived during his waiting and service 
times. 



Proposition 106 // X is such that V is in equilibrium, then the waiting time of any 
customer has distribution given by 



A + 7 - AE(e^G) ' 
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Proof: Unfortunately, we have not established equilibrium of X at the arrival times 
of customers. Therefore, we have to argue from the time when a customer leaves. Due 
to the FIFO discipline, he will leave behind all those customers that arrived during his 
waiting time W and his service time G. Given T = W + G = t, their number N has a 
Poisson distribution with parameter Xt so that 

E(s^|r = t)fT{t)dt = / e^'^'-^^ fT{t)dt 

From Proposition 105 we take E(s^), and putting 7 = A(s — 1), we deduce the formula 
required by rearrangement. □ 

CoroUeiry 107 In the special case of M/M/1, the distribution ofW is given by 

W>{W = 0) = 1 - p and F{W > w) ^ pe-^"-^^"", w>0. 

Proof: We calculate the moment generating function of the proposed distribution 



POD 

e^°(l -p)+ e^*p{n - \)e-^^-^^*dt = + - 

Jo 



p, — X X p — X p — X p — j 



p pp—X—j p p—X— J 
From the preceding proposition we get for our special case 



E(e' 



_ 7(/^- _ p-X jp-lh 



A + 7 - A/i/(/i - 7) /i (A + 7) (/i - 7) - Xp 



and we see that the two are equal. We conclude by the Uniqueness Theorem for moment 
generating functions. □ 



14.3 G/M/1 queues 

For G/M/1 queues, the arrival process is a renewal process. Clearly, by the renewal 
property and by the lack of memory property of the service times, the queue length 
process X starts afresh after each arrival, i.e. Un = Xa^,, n > 0, is a Markov chain on 
{1, 2, 3, . . .}, where An is the nth arrival time. It is actually more natural to consider the 
Markov chain [/„, = Un — I = Xa^_ on N. 

It can be shown that for M/M/1 queues the invariant distribution of U is the same 
as the invariant distribution of V and of X. For general G/M/1 queues we get 

Proposition 108 Let p = l/{pE{Ai)) be the traffic intensity. If p < 1, then U has a 
unique invariant distribution given by 

= (1 - q)q\ keN, 

where q is the smallest positive root of q = E^e'^^'^"^^'^'^) . 



Lecture 14: M/G/1 and G/M/1 queues 



79 



Proof: First note that given an interarrival time Y — y, a Poi{iiy) number of customers 
are served, so U has transition probabihties 

— q— e"^'^ j , j = 0,...,i; tti^o = 1 - Hi+i-j- 
Now for any geometric ^, we get, for k >1, from ToneUi's theorem. 



oo 



Ci'^ik — ^j+fc-lQj+fc-lj 



i=k-l j=0 



e 



and clearly this equals ^jt = (1 — q)q'' if and only if g = 'E{e^^'^~'^^^) —: /(g), as required. 
Note that both sides are continuously difFerentiablc on [0. 1) and on [0, 1] if and only if 
limits g t 1 are finite, /(O) > 0, /(I) = 1 and /'(I) = K{nY) = 1/p, so there is a solution 
if p < 1, since then f{l — e) < 1 — e for e small enough. The solution is unique, since 
there is at most one stationary distribution for the irreducible Markov chain U. The case 
k — can be checked by a similar computation, so ^ is indeed a stationary distribution. 

□ 

Proposition 109 The waiting time W of a customer arriving in equilibrium has distri- 
bution 

W>(W = 0) = 1 - g, ¥(W >w)^ qe-"^^-"^"" , w>0. 

Proof: In equilibrium, an arriving customer finds a number N ^ of customers in the 
queue in front of him, each with a service of Gj ~ Expdi). Clearly P(l^ = 0) = = ^—Q- 
Also since the conditional distribution of N given TV > 1 is geometric with parameter g 
and geometric sums of exponential random variables are exponential, we have that W 
given iV > 1 is exponential with parameter — g). □ 

Alternatively, we can write this proof in formulas as a calculation of F{W > y) by 
conditioning on N . 



oo 



¥{W>w) = ^¥{N ^ n)'¥{W > w\N ^ n) 

n=0 

°° /-oo n 

= + J]g"(l-g) / j-f—-x--'e-^^dx 

roo CO ,,".-1 

= / e-^^g/i(l-g) V-^^ -rq^^^x^'-^dx 

POO 

— q IJ,{1 — q) exp{—ij,x + iJ,qx}dx = qex.p{—ij,{l — q)y}, 

J w 

where we used that the sum of n independent identically exponentially distributed ran- 
dom variables is Gamma distributed. 



80 Lecture Notes - Part B Applied Probability - Oxford MT 2007 



Lecture 15 



Markov models for insurance 



Reading: Ross 7.10; CT4 Unit 6 
Further reading: Norris 5.3 



15.1 The insurance ruin model 

Insurance companies deal with large numbers of insurance policies at risk. They are 
grouped according to type and various other factors into so-called portfohos. Let us 

focus on such a portfolio and model the associated claim processes, the claim sizes and 
the reserve process. We make the following assumptions. 

• Claims arrive according to a Poisson process (Xj)j>o with rate A. 

• Claim amounts (^j)j>i are positive, independent of the arrival process and identi- 
cally distributed with common probability density function k{a), a > 0, and mean 



• The insurance company provides an initial reserve of m > money units. 

• Premiums are paid continuously at constant rate c generating a linear premium 
income accumulating to ct at time t. We assume c > X/i to have more premium 
income than claim outgo, on average. 

• We ignore all expenses and other influences. 

In this setting, we define the following objects of interest 



• The reserve process Rt — u + ct — Ct, t > 0. 

• The ruin probability il'{u) = Fu{Rt < for some t > 0), as a function of Rq = u > 0. 



• The aggregate claims process Ct 




n=l 
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15.2 Aggregate claims and reserve processes 

Proposition 110 C and R have stationary independent increments. Their moment 
generating functions are given by 



and 



E(e^^*) = exp i^Xt (e^" - l)k{a)da 



E(e^^*) = exp + (3ct - (1 - e-'^")A;(a)(ia| 



Proof: First calculate the moment generating function of Cf. 



n] 



= J]E exp<^7E^4r^^* 

neN \ I j=l J / 

neN 

= exp{At (E(e^-^i) - l)} 

which in the case where Ai has a density k, gives the formula required. The same calcu- 
lation for the joint moment generating function of Ct and Ct+s — Cti or more increments, 
yields stationarity and independence of increments (only using the stationarity and in- 
dependence of increments of X, and the independence of the {Aj)j>i). 

The statements for R follow easily with (3 — —7. □ 

The moment generating function is useful to calculate moments. 

Example 111 We differentiate the moment generating functions at zero to obtain 

= Xtii. 



E(a)= |^exp{At(E(e^^^)-l)} 



= \t ^E ie'^') 
7=0 ^7 



7=0 



and ]E(i?i) = u + ct — Xtfi = u + {c — \ii)t. Note that the Strong Law of Large Numbers, 
applied to increments Z^ — Rn — Rn-i yields 

— = — I — > Z,- — > E(Zi) — c — Xii> a.s., as n — > 00. 
n n n ^-^ 

confirming our claim that c> X/i means that, on average, there is more premium income 
than claim outgo. In particular, this implies i?„ ^ 00 a.s. as n — > 00. Does this imply 

that inf{i?s : < s < 00} > —00? No. It is conceivable that between integers, the reserve 
process takes much smaller values. But we will show that infji^^ : < s < 00} > —00. 
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There are other random walks that are embedded in the reserve process: 

Example 112 Consider the process at claim times W„ = Rt„, n>0, where {Tn)n>o are 
the event times of the Poisson process (and Tq — 0). Now 

Wn+i -Wn^ Rt„^, - Rt„ = c{Tn+i - T„) - n > 0, 

are also independent identically distributed increments with E(Vl^„+i — = c/X—ii > 0, 
and the Strong Law of Large Numbers yields 

— - — — I — "S^iWn+i — Wn) — >■ cA — u a.s. as n ^ oo. 
n n n ^-^ 

Again, we conclude Wn — > oo, but note that Wn are the local minima of i?, so 

/oo — inf : < t < oo} = inf n > 0} > -oo. 
As a consequence, if we denote — ct — Ct with associated then 

i/j{u) = Fu{Rt < for some t > 0) = P(J^ < -u) P(/^ = -oo) = 
as — > oo, but this is then ip{oo) — 0. 

15.3 Ruin probabilities 

We now turn to studying the ruin probabilities ip{u), u > 0. 
Proposition 113 The ruin probabilities ip{u) satisfy the renewal equation 

nx 

%Ij{x) = g{x) + I i!{x - y)f{y)dy, x>0, 
Jo 

where 

f{y) = -K{y) = - / k{x)dx and g{x) = — Ko{x) = - / K{y)dy. 
c c Jy c c 

Proof: Condition on Ti ~ Exp{\) and Ai ~ k{a) to obtain 

POO roo 

jjj^x) = / / ijj{x + ct - a)k{a)daXe''^''dt 
Jo Jo 

poo \ roo 

= / -e-^^-^)^/'^ / ^(s - a)k(a)dads 

Jx C Jo 

where we use the convention that ip{x) = 1 for x < 0. 
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Differentiation w.r.t. x yields 

A A 1"°° 

i/j'ix) — —'4>ix) / '4>{x — a)kia)da 

c c Jo 



^ A /"^ A 

—'4>{x) / '^ix — a)kia)da Kix). 

c c Jq c 



Note that we also have a terminal condition ijj (oo) = 0. With this terminal condition, 
this integro-differential equation has a unique solution. It therefore suffices to check that 
any solution of the renewal equation also solves the integro-differential equation. 

For the renewal equation, we only sketch the argument since the technical details 
would distract from the main steps: note that differentiation (we skip the details for 
differentiation under the integral sign!) yields (setting s — x — y in the convolution 
integral) 

^\x) = g'{x) + ^{x)f{0) + l\{s)f{x - s)ds 

Jo 

— K[x) H — '4>{x) / '4>{x — a)k{a)da, 

C C C Jq 

and note also that, with the convention ip^x) = 1 for a; < 0, we can write the renewal 
equation as 

ip{x) = / ipix -y)f{y)dy, 
Jo 

where / is a nonnegative function with f{y)dy = A/c < 1, so for any nonnegative 
solution ip > 0, iIj{x) is less than an average of ip on (— oo, x], and hence ip is decreasing 
(this requires a bit more care), so ip{oo) exists with (by monotone convergence using 
ip{x — y) i 4'{oo) as X — > oo) 

POO POO ^ 

V'(oo) = lim / ip{x - y)f{y)dy = / ip{oo)f{y)dy = -^'(oo) =^ ip{oo) = 0. 

X^OO Jq Jq C 

□ 

Example 114 We can calculate ■0(0) = g{0) — Xji/c. In particular, zero initial reserve 
does not entail ruin with probability 1. In other words, jumps at w = from "0(0") — 1 
to V'(O) = ^(0+) = \n/c < 1. 

CoroIlEiry 115 //A// < c, then t/j is given by 

PX 

ij;{x)^g{x)+ g{x-y)u{y)dy where u{y) ^ ^ f*^"'\y) . 

Proof: This is an application of Exercise A. 6. 2(c), the general solution of the renewal 
equation. Note that / is not a probability density for Xfi < c, but the results (and 
arguments) are valid for nonnegative / with f{y)dy < 1. □ 
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Where is the renewal process? For X/j, < c, there is no renewal process with interarrival 
density / in the strict sense, since / is not a probability density function. One can 
associate a defective renewal process that only counts a geometric number of points, and 
the best way to motivate this is by looking at X/i = c, where the situation is nicer. It can 
be shown that the renewal process is counting new minima of i? or not in the time 
parameterisation of R^, but in the height variable, i.e. 

n = #{n > 1 : e i-h, 0] and = mm{Wo, . . . , W^}}, h>0, 

is a renewal process. Note that / is the distribution of LU where L is a size-biased claim 
and U ~ Unif{0, 1). Intuitively, this is, because big claims are more likely to exceed the 
previous minimal reserve level, hence size-biased L, but the previous level will only be 
exceeded by a fraction LU, since R will not be at its minimum when the claim arrives. 

So what happens if A/i < c? There will only be a finite number of claims that exceed 
the previous minimal reserve level since now Rt ^ oo, and Y remains constant for any 
lower levels of h. 

This is not very explicit. To conclude, let us derive more explicit estimates of ip. 



Proposition 116 Assume that there is a > such that 

poo \ pea 

1 = / e''yf{y)dy = - / e^yK{y)dy. 
Jo c Jo 

Then there is a constant C > such that 

ip{x) ~ Ce~"^ as X — > oo. 



Proof: Define a probability density function f{y) = e"^/(|/), and g{y) = e"'yg{y) and 
ip{x) = e^y%lj{x). Then %Ij{x) satisfies 

px 

tpix) = g{x) + xIj{x- y)f{y)dy. 
Jo 

The solution (obtained as in Corollary 115) converges by the key renewal theorem: 

ip{x) = g{x) + g{x- y)u{y)dy ^ - g{y)dy C as x ^ oo, 
Jo A* Jo 

where 

^(x)=5]r(«)(a;). 

n>l 



Note that g is not necessarily non- increasing, but it can be checked that it is integrable, 
and a version of the key renewal theorem still applies. □ 
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Example 117 If An ~ Exp{l/fj,), then in the notation of Proposition 113 

g(x) = ^e-^/'' and f(y) = -e"^/'' 
c c 

so that the renewal equation becomes 

c c Jo 

In particular ■0(0) = X/i/c. After differentiation and cancellation 

ip'(x) =(---] ^jJ(x) =^ ip(x) ^ — exp < -- — —x\ 
\c i^J c t C/i J 



15.4 Some simple finite-state-space models 

Example 118 (Sickness-death) In health insurance, the following model arises. Let 
S = {if, 5, A} consist of the states healthy, sick and dead. Clearly, A is absorbing. All 
other transitions are possible, at different rates. Under the assumption of full recovery 
after sickness, the state of health of the insured can be modelled by a continuous-time 
Markov chain. 

Example 119 (Multiple decrement model) A life assurance often pays benefits not 
only upon death but also when a critical illness or certain losses of limbs, sensory losses 
or other disability are suffered. The assurance is not usually terminated upon such an 
event. 



Example 120 (Marital status) Marital status has a non- negligible effect for various 
insurance types. The state space is § = {B, M, D, W, A} to model bachelor, married, 
divorced, widowed, dead. Not all direct transitions are possible. 

Example 121 (No claims discount) In automobile and some other general insurances, 
you get a discount on your premium depending on the number of years without (or at most 
one) claim. This gives rise to a whole range of models, e.g. § = {0%, 20%, 40%, 50%, 60%}. 

In all these examples, the exponential holding times are not particularly realistic. 
There are usually costs associated either with the transitions or with the states. Also, es- 
timation of transition rates is of importance. A lot of data are available and sophisticated 
methods have been developed. 



Lecture 16 

Conclusions and Perspectives 



16.1 Summary of the course 

This course was about stochastic process models X — {Xf)f>Q in continuous time and 
(mostly) a discrete state space S, often N. Applications include those where X describes 

• counts of births, atoms, bacteria, visits, trials, arrivals, departures, insurance claims, 
etc. 

• the size of a population, the number of buses in service, the length of a queue 

and others can be added. Important is the real structure, the real transition mecha- 
nism that we wish to model by X. Memory plays an important role. We distinguish 

• Markov property (lack of memory, exponential holding times; past irrelevant for 
the future except for the current state) 

• Renewal property (information on previous states irrelevant, but duration in state 
relevant; Markov property at transition times) 

• Stationarity, equilibrium (behaviour homogeneous in time; for Markov chains, in- 
variant marginal distribution; for renewal processes, stationary increments) 

• Independence (of individuals in population models, of counts over disjoint time 
intervals, etc.) 

Once we are happy that such conditions are met, we have a model X for the real 
process, and we study it under our model assumptions. We study 

• different descriptions of X (jump chain - holding times, transition probabilities - 
forward-backward equations, infinitesimal behaviour) 

• convergence to equilibrium (invariant distributions, convergence of transition prob- 
abilities, ergodic theorem; strong law and CLT of renewal theory, renewal theorems) 

• hitting times, excess life, recurrence times, waiting times, ruin probabilities 
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• odd behaviour (explosion, transience, arithmetic interarrival times) 
Techniques 

• Conditioning, one-step analysis 

• detailed balance equations 

• algebra of limits for almost sure convergence 



16.2 Duration-dependent transition rates 

Renewal processes can be thought of as duration-dependent transition rates. If the in- 
terarrival distribution is not exponential, then (at least some) residual distributions will 
not be the same as the full interarrival distribution, but we can still express, say for Z 
with density / that 

P(Z_t>,|Z>t) = Eii^l±il and ^(' + ^) 



P(Z > t) .|-^.V , p^^ ^ 

If we define 

^ ^ P(Z > t) F{t) ' 
where F{t) — ¥{Z > t) and in particular F(0) = 1, we can write 

F(t) =exp|-^ A(s)cis| and /(t) = A(i) exp |- ^ A(s)ds| , 

We can then also express the residual distributions in terms of A(s) 

P(Z-t > s|Z > t) =exp|-^ X{r)dr 

X{t) can be interpreted as the instantaneous arrival rate time t after the previous arrival. 
Similarly, we can use this idea in Markov models and split a holding rate Xi{d) depending 
on the duration d of the current visit to state i into transition rates Xi{d) — X^^^j Qij{d)- 



16.3 Time-dependent transition rates 

A different type of varying transition intensities is obtained if we make the rates X{t) 
dependent on global time t. Here, the time passed in a given state is irrelevant, but only 
the actual time matters. This is useful to model seasonal effects. E.g. the intensity of 
road accidents may be considered higher in winter than in summer. So, a Poisson process 
to model this could have intensity X{t) = Aq + AiCos(27rt). This can also be built into 
birth- and death-rates of population models, insurance models etc. 
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16.4 Spatial processes 

In the case of Poisson counts, one can also look at intensity functions on or M.'^ and 
look at "arrivals" as random points in the plane. 

7V([0, t] X [0, z]) = X{t, z) ~ Pot (^1^ A(s, y)dyds^ 

and such that counts in disjoint rectangles are independent Poisson variables. 

16.5 Markov processes in uncountable state spaces 
(R or R^) 

We have come across some processes for which we could have proved a Markov property, 
the age process {At)t>o of a renewal process, the excess process {Et)t>o of a renewal pro- 
cess, but also the processes {Ct)t>o and {Rt)t>o with stationary independent increments 
that arose in insurance ruin by combining Poisson arrival times with jump sizes. A sys- 
tematic study of such Markov processes in M is technically much harder, although many 
ideas and results transfer from our countable state space model. 

Diffusion processes as a special class of such Markov processes are studied in a Finance 
context in BlOb, in the context of Stochastic Differential Equations in a Part C course, 
and in a Genetics context in another Part C course. 

16.6 Levy processes 

A particularly nice class of processes are processes with stationary independent incre- 
ments, so-called Levy processes. If you have learned or are about to learn about Brown- 
ian motion in another course (BlOb), then you know most Levy processes since a general 
Levy process can be written as Xf — /it + aB^ + Ct where ^ is a drift coefficient, a 
a scale parameter for the Brownian motion process B, and C is a limit of compound 
Poisson processes like the claim size process above. In fact, C may have infinitely many 
jumps in a finite interval, that are summable in some sense, but not necessarily absolutely 
summable, but these jumps can bedescribed by a family of independent Poisson processes 
with associated independent jump heights, in fact a Poisson measure on [0, oo) x R* with 
intensity function ... 

There is a Part C course on Levy processes in Finance. 

16.7 Stationary processes 

We have come across stationary Markov chains and stationary increments of other pro- 
cesses. Stationarity is a concept that can be studied separately. In our examples, the 
dependence structure of processes was simple: independent increments, or Markovian 
dependence, independent holding times etc. More complicated dependence structures 
may be studied. 
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16.8 4th year dissertation projects 

Other than this, if you wish to study any of the stochastic processes or the apphcations 
more deeply in your fourth year, there are several people in the Statistics Department 
and the Mathematical Institute who would be willing to supervise you. Think about 
this, maybe over the Christmas vacations since the Easter vacations arc close to the 
exams. Dissertation projects for Maths&Stats students are arranged before the summer 
to ensure that every student obtains a project well in advance, you can start working on 
your project during the summer. For Maths students dissertation projects are optional. 
Please get in touch with me or other potential supervisors in Hilary term, if you wish to 
discuss your possibilities. I am also happy to direct you to other people. 



